Lecture Notes in Computer Science
Lecture Notes in Computer Science
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
New York University, NY, USA
Doug Tygar
University of California, Berkeley, CA, USA
Moshe Y. Vardi
Rice University, Houston, TX, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Franz Rothlauf et al. (Eds.)
Applications
of Evolutionary
Computing
EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT
EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC
Budapest, Hungary, April 10-12, 2006
Proceedings
13
Volume Editors
ISSN 0302-9743
ISBN-10 3-540-33237-5 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-33237-4 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 11732242 06/3142 543210
Lecture Notes in Computer Science
For information about Vols. 13821
Vol. 3927: J. Hespanha, A. Tiwari (Eds.), Hybrid Systems: Vol. 3895: O. Goldreich, A.L. Rosenberg, A.L. Selman
Computation and Control. XII, 584 pages. 2006. (Eds.), Theoretical Computer Science. XII, 399 pages.
2006.
Vol. 3925: A. Valmari (Ed.), Model Checking Software.
X, 307 pages. 2006. Vol. 3894: W. Grass, B. Sick, K. Waldschmidt (Eds.), Ar-
chitecture of Computing Systems - ARCS 2006. XII, 496
Vol. 3924: P. Sestoft (Ed.), Programming Languages and
pages. 2006.
Systems. XII, 343 pages. 2006.
Vol. 3890: S.G. Thompson, R. Ghanea-Hercock (Eds.),
Vol. 3923: A. Mycroft, A. Zeller (Eds.), Compiler Con-
Defence Applications of Multi-Agent Systems. XII, 141
struction. XV, 277 pages. 2006.
pages. 2006. (Sublibrary LNAI).
Vol. 3922: L. Baresi, R. Heckel (Eds.), Fundamental Ap-
Vol. 3889: J. Rosca, D. Erdogmus, J.C. Prncipe, S. Haykin
proaches to Software Engineering. XIII, 427 pages. 2006.
(Eds.), Independent Component Analysis and Blind Sig-
Vol. 3921: L. Aceto, A. Inglfsdttir (Eds.), Foundations nal Separation. XXI, 980 pages. 2006.
of Software Science and Computation Structures. XV, 447
Vol. 3888: D. Draheim, G. Weber (Eds.), Trends in Enter-
pages. 2006.
prise Application Architecture. IX, 145 pages. 2006.
Vol. 3920: H. Hermanns, J. Palsberg (Eds.), Tools and
Vol. 3887: J.R. Correa, A. Hevia, M. Kiwi (Eds.), LATIN
Algorithms for the Construction and Analysis of Systems.
2006: Theoretical Informatics. XVI, 814 pages. 2006.
XIV, 506 pages. 2006.
Vol. 3886: E.G. Bremer, J. Hakenberg, E.-H.(S.) Han,
Vol. 3916: J. Li, Q. Yang, A.-H. Tan (Eds.), Data Min-
D. Berrar, W. Dubitzky (Eds.), Knowledge Discovery in
ing for Biomedical Applications. VIII, 155 pages. 2006.
Life Science Literature. XIV, 147 pages. 2006. (Sublibrary
(Sublibrary LNBI).
LNBI).
Vol. 3915: R. Nayak, M.J. Zaki (Eds.), Knowledge Dis-
Vol. 3885: V. Torra, Y. Narukawa, A. Valls, J. Domingo-
covery from XML Documents. VIII, 105 pages. 2006.
Ferrer (Eds.), Modeling Decisions for Artificial Intelli-
Vol. 3907: F. Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. gence. XII, 374 pages. 2006. (Sublibrary LNAI).
Cotta, R. Drechsler, E. Lutton, P. Machado, J.H. Moore, J.
Vol. 3884: B. Durand, W. Thomas (Eds.), STACS 2006.
Romero, G.D. Smith, G. Squillero, H. Takagi (Eds.), Ap-
XIV, 714 pages. 2006.
plications of Evolutionary Computing. XXIV, 813 pages.
2006. Vol. 3881: S. Gibet, N. Courty, J.-F. Kamp (Eds.), Gesture
in Human-Computer Interaction and Simulation. XIII,
Vol. 3906: J. Gottlieb, G.R. Raidl (Eds.), Evolutionary
344 pages. 2006. (Sublibrary LNAI).
Computation in Combinatorial Optimization. XI, 293
pages. 2006. Vol. 3880: A. Rashid, M. Aksit (Eds.), Transactions on
Aspect-Oriented Software Development I. IX, 335 pages.
Vol. 3905: P. Collet, M. Tomassini, M. Ebner, S.
2006.
Gustafson,A. Ekrt (Eds.), Genetic Programming. XI, 361
pages. 2006. Vol. 3879: T. Erlebach, G. Persinao (Eds.), Approximation
and Online Algorithms. X, 349 pages. 2006.
Vol. 3904: M. Baldoni, U. Endriss, A. Omicini, P. Tor-
roni (Eds.), Declarative Agent Languages and Technolo- Vol. 3878: A. Gelbukh (Ed.), Computational Linguistics
gies III. XII, 245 pages. 2006. (Sublibrary LNAI). and Intelligent Text Processing. XVII, 589 pages. 2006.
Vol. 3903: K. Chen, R. Deng, X. Lai, J. Zhou (Eds.), Infor- Vol. 3877: M. Detyniecki, J.M. Jose, A. Nrnberger, C. J.
mation Security Practice and Experience. XIV, 392 pages. . van Rijsbergen (Eds.), Adaptive Multimedia Retrieval:
2006. User, Context, and Feedback. XI, 279 pages. 2006.
Vol. 3901: P.M. Hill (Ed.), Logic Based Program Synthesis Vol. 3876: S. Halevi, T. Rabin (Eds.), Theory of Cryptog-
and Transformation. X, 179 pages. 2006. raphy. XI, 617 pages. 2006.
Vol. 3899: S. Frintrop,VOCUS:AVisualAttention System Vol. 3875: S. Ur, E. Bin, Y. Wolfsthal (Eds.), Haifa Verifi-
for Object Detection and Goal-Directed Search. XIV, 216 cation Conference. X, 265 pages. 2006.
pages. 2006. (Sublibrary LNAI). Vol. 3874: R. Missaoui, J. Schmidt (Eds.), Formal Concept
Vol. 3897: B. Preneel, S. Tavares (Eds.), Selected Areas in Analysis. X, 309 pages. 2006. (Sublibrary LNAI).
Cryptography. XI, 371 pages. 2006. Vol. 3873: L. Maicher, J. Park (Eds.), Charting the Topic
Vol. 3896: Y. Ioannidis, M.H. Scholl, J.W. Schmidt, F. Maps Research and Applications Landscape. VIII, 281
Matthes, M. Hatzopoulos, K. Boehm,A. Kemper, T. Grust, pages. 2006. (Sublibrary LNAI).
C. Boehm (Eds.), Advances in Database Technology - Vol. 3872: H. Bunke, A. L. Spitz (Eds.), Document Anal-
EDBT 2006. XIV, 1208 pages. 2006. ysis Systems VII. XIII, 630 pages. 2006.
Vol. 3870: S. Spaccapietra, P. Atzeni, W.W. Chu, T. Vol. 3844: J.-M. Bruel (Ed.), Satellite Events at the MoD-
Catarci, K.P. Sycara (Eds.), Journal on Data Semantics ELS 2005 Conference. XIII, 360 pages. 2006.
V. XIII, 237 pages. 2006. Vol. 3843: P. Healy, N.S. Nikolov (Eds.), Graph Drawing.
Vol. 3869: S. Renals, S. Bengio (Eds.), Machine Learning XVII, 536 pages. 2006.
for Multimodal Interaction. XIII, 490 pages. 2006. Vol. 3842: H.T. Shen, J. Li, M. Li, J. Ni, W. Wang (Eds.),
Vol. 3868: K. Rmer, H. Karl, F. Mattern (Eds.), Wireless Advanced Web and Network Technologies, and Applica-
Sensor Networks. XI, 342 pages. 2006. tions. XXVII, 1057 pages. 2006.
Vol. 3866: T. Dimitrakos, F. Martinelli, P.Y.A. Ryan, S. Vol. 3841: X. Zhou, J. Li, H.T. Shen, M. Kitsuregawa, Y.
Schneider (Eds.), Formal Aspects in Security and Trust. Zhang (Eds.), Frontiers of WWW Research and Develop-
X, 259 pages. 2006. ment - APWeb 2006. XXIV, 1223 pages. 2006.
Vol. 3865: W. Shen, K.-M. Chao, Z. Lin, J.-P.A. Barths Vol. 3840: M. Li, B. Boehm, L.J. Osterweil (Eds.), Uni-
(Eds.), Computer Supported Cooperative Work in Design fying the Software Process Spectrum. XVI, 522 pages.
II. XII, 359 pages. 2006. 2006.
Vol. 3863: M. Kohlhase (Ed.), Mathematical Knowledge Vol. 3839: J.-C. Fillitre, C. Paulin-Mohring, B. Werner
Management. XI, 405 pages. 2006. (Sublibrary LNAI). (Eds.), Types for Proofs and Programs. VIII, 275 pages.
2006.
Vol. 3862: R.H. Bordini, M. Dastani, J. Dix, A.E.F.
Seghrouchni (Eds.), Programming Multi-Agent Systems. Vol. 3838: A. Middeldorp, V. van Oostrom, F. van Raams-
XIV, 267 pages. 2006. (Sublibrary LNAI). donk, R. de Vrijer (Eds.), Processes, Terms and Cycles:
Steps on the Road to Infinity. XVIII, 639 pages. 2005.
Vol. 3861: J. Dix, S.J. Hegner (Eds.), Foundations of In-
formation and Knowledge Systems. X, 331 pages. 2006. Vol. 3837: K. Cho, P. Jacquet (Eds.), Technologies for
Advanced Heterogeneous Networks. IX, 307 pages. 2005.
Vol. 3860: D. Pointcheval (Ed.), Topics in Cryptology
CT-RSA 2006. XI, 365 pages. 2006. Vol. 3836: J.-M. Pierson (Ed.), Data Management in Grids.
X, 143 pages. 2006.
Vol. 3858: A. Valdes, D. Zamboni (Eds.), RecentAdvances
in Intrusion Detection. X, 351 pages. 2006. Vol. 3835: G. Sutcliffe, A. Voronkov (Eds.), Logic for Pro-
gramming, Artificial Intelligence, and Reasoning. XIV,
Vol. 3857: M.P.C. Fossorier, H. Imai, S. Lin, A. Poli
744 pages. 2005. (Sublibrary LNAI).
(Eds.), Applied Algebra, Algebraic Algorithms and Error-
Correcting Codes. XI, 350 pages. 2006. Vol. 3834: D.G. Feitelson, E. Frachtenberg, L. Rudolph,
U. Schwiegelshohn (Eds.), Job Scheduling Strategies for
Vol. 3855: E. A. Emerson, K.S. Namjoshi (Eds.), Verifi-
Parallel Processing. VIII, 283 pages. 2005.
cation, Model Checking, and Abstract Interpretation. XI,
443 pages. 2005. Vol. 3833: K.-J. Li, C. Vangenot (Eds.), Web and Wireless
Geographical Information Systems. XI, 309 pages. 2005.
Vol. 3854: I. Stavrakakis, M. Smirnov (Eds.), Autonomic
Communication. XIII, 303 pages. 2006. Vol. 3832: D. Zhang, A.K. Jain (Eds.), Advances in Bio-
metrics. XX, 796 pages. 2005.
Vol. 3853:A.J. Ijspeert, T. Masuzawa, S. Kusumoto (Eds.),
Biologically Inspired Approaches to Advanced Informa- Vol. 3831: J. Wiedermann, G. Tel, J. Pokorn, M.
tion Technology. XIV, 388 pages. 2006. Bielikov, J. tuller (Eds.), SOFSEM 2006: Theory and
Vol. 3852: P.J. Narayanan, S.K. Nayar, H.-Y. Shum (Eds.), Practice of Computer Science. XV, 576 pages. 2006.
Computer Vision ACCV 2006, Part II. XXXI, 977 pages. Vol. 3830: D. Weyns, H. V.D. Parunak, F. Michel (Eds.),
2006. Environments for Multi-Agent Systems II. VIII, 291
Vol. 3851: P.J. Narayanan, S.K. Nayar, H.-Y. Shum (Eds.), pages. 2006. (Sublibrary LNAI).
Computer Vision ACCV 2006, Part I. XXXI, 973 pages. Vol. 3829: P. Pettersson, W. Yi (Eds.), Formal Modeling
2006. and Analysis of Timed Systems. IX, 305 pages. 2005.
Vol. 3850: R. Freund, G. Paun, G. Rozenberg, A. Salomaa Vol. 3828: X. Deng, Y. Ye (Eds.), Internet and Network
(Eds.), Membrane Computing. IX, 371 pages. 2006. Economics. XVII, 1106 pages. 2005.
Vol. 3849: I. Bloch, A. Petrosino, A.G.B. Tettamanzi Vol. 3827: X. Deng, D.-Z. Du (Eds.), Algorithms and
(Eds.), Fuzzy Logic and Applications. XIV, 438 pages. Computation. XX, 1190 pages. 2005.
2006. (Sublibrary LNAI).
Vol. 3826: B. Benatallah, F. Casati, P. Traverso (Eds.),
Vol. 3848: J.-F. Boulicaut, L. De Raedt, H. Mannila (Eds.), Service-Oriented Computing - ICSOC 2005. XVIII, 597
Constraint-Based Mining and Inductive Databases. X, 401 pages. 2005.
pages. 2006. (Sublibrary LNAI).
Vol. 3824: L.T. Yang, M. Amamiya, Z. Liu, M. Guo, F.J.
Vol. 3847: K.P. Jantke, A. Lunzer, N. Spyratos, Y. Tanaka Rammig (Eds.), Embedded and Ubiquitous Computing
(Eds.), Federation over the Web. X, 215 pages. 2006. (Sub- EUC 2005. XXIII, 1204 pages. 2005.
library LNAI).
Vol. 3823: T. Enokido, L. Yan, B. Xiao, D. Kim, Y. Dai,
Vol. 3846: H. J. van den Herik, Y. Bjrnsson, N.S. Ne- L.T. Yang (Eds.), Embedded and Ubiquitous Computing
tanyahu (Eds.), Computers and Games. XIV, 333 pages. EUC 2005 Workshops. XXXII, 1317 pages. 2005.
2006.
Vol. 3822: D. Feng, D. Lin, M. Yung (Eds.), Information
Vol. 3845: J. Farr, I. Litovsky, S. Schmitz (Eds.), Imple- Security and Cryptology. XII, 420 pages. 2005.
mentation and Application of Automata. XIII, 360 pages.
2006.
Volume Editors
This year, the EvoWorkshops had the highest number of submissions ever.
The number of submissions increased from 123 in 2004 to 143 in 2005 to 149
in 2006. EvoWorkshops 2006 accepted full papers with twelve pages and short
papers with a reduced number of ve pages. The acceptance rate of 43.6% for
EvoWorkshops 2006 is an indicator for the high quality of the papers presented at
the workshops and included in these proceedings. The following table gives some
details on the number of submissions, the number of accepted papers, and the
acceptance ratios for EvoWorkshops 2005 and EvoWorkshops 2006 (accepted
short papers are in brackets). Of further importance for the statistics is the
acceptance rate of EvoWorkshops 2004 which was 44.7%.
2006 2005
year
submissions accept ratio submissions accept ratio
EvoBIO 40 21 52.5% 32 13 40.6%
EvoCOMNET 16 5 31.2% 22 5 22.7%
EvoHOT 9 5 55.6% 11 7 63.6%
EvoIASP 35 12(7) 34.3% 37 17 45.9%
EvoInteraction 8 6 75% - - -
EvoMUSART 29 10(4) 34.5% 29 10(6) 34.5%
EvoSTOC 12 6(2) 50.0% 12 4(4) 33.3%
Total 149 65(13) 43.6% 143 56(10) 39.1%
We would like to thank all the members of the program committees for
their quick and thorough work. We thank the Artpool Art Research Center
of Budapest, and especially Gyorgy Galantai, for oering space and expertise
without which the wonderful evolutionary art and music exhibition associated
with the conference would not have been possible. Furthermore, we would like
to acknowledge the support from Napier University, Edinburgh.
Finally, we would like to say a special thanks to everybody who was involved
in the preparation of the event. Special thanks are due to Jennifer Willies, whose
work is a great and invaluable help. Without her support, running such a type
of conference with a large number of dierent organizers and dierent opinions
would be impossible. Further thanks go to the local organizer, Aniko Ekart, and
her group, who made it possible to run such a conference in such a nice place.
EvoWorkshops 2006 was jointly organized with EuroGP 2006 and EvoCOP 2006.
Organizing Committee
Program Committees
Sponsoring Institutions
EvoBIO Contributions
Functional Classication of G-Protein Coupled Receptors, Based on
Their Specic Ligand Coupling Patterns
Burcu Bakir, Osman Ugur Sezerman . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
EvoCOMNET Contributions
BeeHiveGuard: A Step Towards Secure Nature Inspired Routing
Algorithms
Horst F. Wedde, Constantin Timm, Muddassar Farooq . . . . . . . . . . . . 243
EvoHOT Contributions
Optimisation of Constant Matrix Multiplication Operation Hardware
Using a Genetic Algorithm
Andrew Kinane, Valentin Muresan, Noel OConnor . . . . . . . . . . . . . . . 296
EvoIASP Contributions
Image Space Colonization Algorithm
Leonardo Bocchi, Lucia Ballerini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
EvoINTERACTION Contributions
On Interactive Evolution Strategies
Ron Breukelaar, Michael T.M. Emmerich, Thomas Back . . . . . . . . . . 530
EvoMUSART Contributions
Supervised Genetic Search for Parameter Selection in Painterly
Rendering
John P. Collomosse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
Consensual Paintings
Paulo Urbano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
EvoSTOC Contributions
A Preliminary Study on Handling Uncertainty in Indicator-Based
Multiobjective Optimization
Matthieu Basseur, Eckart Zitzler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
1 Introduction
G-Protein Coupled Receptors (GPCRs) are vital protein bundles with their key
role in cellular signaling and regulation of various basic physiological processes.
With their versatile functions in a wide range of physiological cellular condi-
tions, they constitute one of the vastest families of eukaryotic transmembrane
proteins [29]. In addition to the biological importance of their functional roles,
their interaction with more than 50% of prescription drugs have lead GPCRs
to be an excellent potential therapeutic target class for drug design and cur-
rent pharmaceutical research. Over the last 20 years, several hundred new drugs
have been registered which are directed towards modulating more than 20 dif-
ferent GPCRs, and approximately 40% of the top 200 synthetic drugs act on
GPCRs [6]. Therefore, many pharmaceutical companies are involved in carrying
out research aimed towards understanding the structure and function of these
GPCR proteins. Even though thousands of GPCR sequences are known as a
result of ongoing genomics projects [10], the crystal structure has been solved
only for one GPCR sequence using electron diraction at medium resolution (2.8
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 112, 2006.
c Springer-Verlag Berlin Heidelberg 2006
2 B. Bakir and O.U. Sezerman
A) to date [15] and for many of the GPCRs the activating ligand is unknown,
which are called orphan GPCRs [25]. Hence, based on sequence information,
a functional classication method of those orphan GPCRs and new upcoming
GPCR sequences is of great practical use in facilitating the identication and
characterization of novel GPCRs.
Albeit laboratory experiments are the most reliable methods, they are not
cost and labour eective. To automate the process, computational methods such
as decision trees, discriminant analysis, neural networks and support vector ma-
chines (SVMs), have been extensively used in the elds of classication of biolog-
ical data [21]. Among these methods, SVMs give best prediction performance,
when applied to many real-life classication problems, including biological issues
[30]. One of the most critical issues in classication is the minimization of the
probability of error on test data using the trained classier, which is also known
as structural risk minimization. It has been demonstrated that SVMs are able
to minimize the structural risk through nding a unique hyper-plane with max-
imum margin to separate data from two classes [27]. Therefore, compared with
the other classication methods, SVM classiers supply the best generalization
ability on unseen data [30].
In the current literature, to classify GPCRs in dierent levels of families,
there exist dierent attempts, such as using primary database search tools, e.g.,
BLAST [1], FASTA [20]. However, these methods require the query protein to
be signicantly similar to the database sequences in order to work properly.
In addition to these database search tools, the same problem is addressed by
using secondary database methods (proles and patterns for classication), e.g.,
Attwood et al. have worked in particular on GPCRs in the PRINTS database
[2] (whose data appeared in INTERPRO database [17]). Hidden Markov Models
[24], bagging classication trees [32] and SVMs [13], [31] are other methods that
have been used to classify GPCRs in dierent levels of families. Karchin et al.
conducted the most comprehensive controlled experiments for sequence based
prediction of GPCRs in [13] and showed that SVMs gave the highest accuracy
in recognizing GPCR families. Whereas, in SVMs, an initial step to transform
each protein sequence into a xed-length vector is required and the predictive
accuracy of SVMs signicantly depends on this particular xed-length vector. In
[13], it is also pointed out that the SVM performance could be further increased
by using feature vectors that encode only the most relevant features, since SVMs
do not identify the features most responsible for class discrimination. Therefore,
for an accurate SVM classication, feature vectors should reect the unique
biological information contained in sequences, which is specic to the type of
classication problem.
In this paper, we address Level 2 subfamily classication of Amine GPCRs
problem by applying Support Vector Machine (SVM) technique, using a novel
xed-length feature vector, based on the existence of activating ligand specic
patterns. We obtain discriminative feature vectors by utilizing biological knowl-
edge of the Level 2 subfamilies transmembrane topology and identifying specic
patterns for each Level 2 subfamily. Since these specic patterns carry ligand
Functional Classication of G-Protein Coupled Receptors 3
binding information, the features obtained from these patterns are more relevant
features than amino acid and dipeptide composition of GPCR sequences, which
in turn improves the accuracy of GPCR Level 2 subfamily classication. Apply-
ing our method on Amine Level 1 subfamily of GPCRs [10], we have shown that
the classication accuracy is increased compared to the previous studies at the
same level of classication.
2 Background
Fig. 1. Portion of GPCR family tree showing the ve main classes of GPCRs and some
subfamily members, based on the GPCRDB information system [10]
Table 1. Summary of 168 Class A, Amine GPCRs, classied into four Level 2 Sub-
families as shown in [9]
novel xed-length feature vectors. Details of the three steps (Topology Predic-
tion, Pattern Discovery, and Pattern Matching) for xed-length feature vector
creation are described below.
Topology Prediction. Since transmembrane (TM) topology pattern is shown
to be well conserved among GPCRs that have the same function [19], for the
168 GPCR sequences in Elrod and Chaus dataset, TM topology is checked. For
topology prediction, Hidden Markov Model for Topology Prediction (HMMTOP)
server, which accurately predicts the topology of helical TM proteins, is used [26].
In order to segment amino acid sequences into membrane, inside, outside parts,
HMMTOP method utilizes HMMs in a way that the product of the relative
frequencies of the amino acids of these segments along the amino acid sequence
is maximized. This shows that the maximum of the likelihood function on the
space of all possible topologies of a given amino acid sequence, correlates with
the experimentally established topology [25].
Following topology prediction, extracellular loop sequences are extracted for
each 168 GPCR sequences, based on the fact that ligands couple to extracellular
loops of GPCRs and we are interested in the relation between ligand specicity
of GPCRs and GPCR sub-sub-family classication.
Pattern Discovery. In the second step of the xed-length feature vector cre-
ation, for each sub-sub-family of GPCR sequences, exible patterns that are
conserved in the extracellular loop of that particular sub-sub-family GPCR se-
quences are found by using Pratt 2.1., exible pattern discovery program [4].
Due to the exibility of the Pratt patterns, they include ambiguous compo-
nents, xed and exible wildcards, in addition to their identity components [12].
Hence, Pratt patterns are described using PROSITE notation [4].
Pratt nds patterns matching a subset of the input sequences. This subset is
dened by Min Percentage Seqs to Match (MPSM) parameter, which denes
the minimum percentage of the input sequences that should match a pattern.
This threshold is set to 50% and 75% in this study in order not to allow for
6 B. Bakir and O.U. Sezerman
some very specic patterns that are not general to all GPCR sequences in any
sub-sub-family. This can also be thought as a precaution to prevent overtting
problem. For each class of GPCRs, 50 conserved patterns are identied by two
dierent MPSM parameters (50 and 75).
Pattern Matching. The nal step for creating xed-length feature vectors is
to check for the existence of every activating ligand specic pattern in each outer
GPCR sequence. In order to check the existence of the exible Pratt patterns, all
patterns in PROSITE notation are converted into regular expression form and
then they are searched within 168 extracellular GPCR sequences. Consequently,
by taking activating ligand specic pattern existence information into account,
each GPCR sequence is represented with a vector in the 200 dimensional space
(50 patterns multiplied by 4 output classes).
where Gk,1 , Gk,2 Gk,200 are the 200 components of activating ligand specic
pattern inclusion for the k th extracellular GPCR sequence Gk . Note that if the
k th extracellular GPCR sequence has the pattern j, then Gk,j =1 and if the k th
extracellular GPCR sequence does not have the pattern j, then Gk,j =0, where
j=1, 2, ... 200.
Writing down each xed-length feature vector, Gk , in a new row, we obtain
a Gk,j matrix, where k=1, 2, ... 168; j=1, 2, ... 200. After insertion of the sub-
sub-family labels for each of the GPCR sequences into the zeroth dimension of
each Gk vector (Gk,0 ), the matrix corresponds to a training set. So that k=0, 1,
2, ... 168, where Gk,0 is 1, 2, 3 or 4, since four sub-sub-families are dened for
this classication problem. Note that these 4 class output labelling (1, 2, 3, 4)
does not imply any relationship between classes.
We have also created a second xed-length feature vector, by using the best
10 patterns among the 50 patterns based on signicance scores assigned by the
Pratt program from each sub-sub-family. Using a similar representation, Gk is
denoted in 40 dimensional space (10 patterns multiplied by 4 output classes),
where j=1, 2, ... 40. A Gk,j matrix is formed (similar to above), where k=1, 2,
... 168 and j=1, 2, ... 40 corresponding another training set.
As a result, four training sets (two training sets with 50 MPSM parameter,
for j up to 40 or 200; another two with 75 MPSM parameter, for j up to 40
or 200) are created to produce a classier using Support Vector Machines, as
mentioned in detail below.
false negative than WU-BLAST and SAM-T2K prole HMMs [13]. For these
reasons, we selected to use SVMs for GPCRs Level 2 subfamily classication
problem.
4 Experiments
Since we are interested in the classication of Amine Level 1 sub-family into
four Level 2 subfamilies, we are facing with a multi-class classication problem.
We use LIBSVM software [7], which deals with multi-class classication problem
implementing one-against-one approach. As suggested in [11], to be able to get
satisfactory results, some preprocesses are performed before building a classier
using LIBSVM. Preprocesses, that are performed in this study, can be summa-
rized in two headlines: i) Choice of Kernel function, ii) Grid search combined
with cross-validation for parameter tuning.
the training set is not big enough to separate into two, 10-fold cross-validation
is done for each of the four training sets. Combining our biologically relevant
xed-length feature vector denition with a robust kernel, RBF, and parameter
tuning with a grid search technique shows promising results, which is analyzed
more in detail in the next section.
Fig. 3. Coarse grid search on C and with 10-fold cross validation for the training data
with 200 attributes and 50 MPSM parameter. Highest cross-validation accuracy is
obtained when =24 and C=(20 , 210 ).
5 Results
As mentioned before, in addition to the SVM classication with parameter tuned
RBF kernel, other three standard kernel functions are tested as well (with their
default parameters) on our four training data using 10-fold cross validation.
Results for each experiment are summarized in Table 2.
Classication with RBF kernel with parameter tuning clearly outperforms
other kernel functions in all cases. Since linear kernel is the specialized form of
RBF kernel, results obtained with these two kernels without parameter tuning
are quite close. Although, the classication accuracy with 200 and 40 attributes
are so close, accuracy with 200 attributes are consistently better than with 40
attributes. The probable reason behind this observation is that 40 attributes are
not enough to represent the examples (more attributes are needed to discrimi-
nate between data points), or those chosen 40 attributes do not correctly reect
the data points. In contrast to the strict domination of 200 attributes over 40
attributes, there is no such a relationship between training data with 50 MPSM
parameter and 75 MPSM parameter. While sometimes one performs better, it
is vice versa (e.g. results of RBF Kernel and RBF* Kernel in Table 2). Lower
accuracy for the training data with 75 MPSM parameter is caused by overt-
ting, which decreases accuracy at the end whereas with 50 MPSM parameter,
patterns that are conserved in at least 50% of the data can not represent overall
data.
Functional Classication of G-Protein Coupled Receptors 9
Table 2. Results for four dierent training sets, as explained in the text, using four
dierent kernel functions and RBF kernel with parameter tuning (RBF*), with 10-fold
cross-validation
6 Discussion
The dierence of this study from previous studies can be emphasized in two
main points:
i) Fixed-length feature vector creation: We developed a novel method for ob-
taining xed-length feature vectors of SVM. The naive idea that using direct pro-
tein sequence information as feature vector can not be used in SVM classication
since the sequence length is not xed. Many studies [9], [32], [31] attempted this
problem by dening a xed-length feature vector based on the proteins amino
acid composition. Following the representation in [8], each protein is represented
by a vector, Xk , in 20 dimensional space, where each dimension corresponds
to how many times that particular amino acid, which represents that specic
dimension, occurred in those particular protein.
where Xk,1 , Xk,2 ... Xk,20 are 20 components of amino acid composition for
the k th protein Xk . In addition to the amino acid composition, in some of the
studies, xed-length vector is obtained by dipeptide composition [3], which takes
local order of amino acids into account, in addition to the information about the
fraction of amino acids. The dipeptide composition of each protein is shown
10 B. Bakir and O.U. Sezerman
References
1. Altshul, S. et al.: Basic local alignment search tool. J. Mol. Biol. 215 (1990) 403
410
2. Attwood, T.K. et al.: PRINTS and its automatic supplement, prePRINTS. Nucleic
Acids Research 31 (2003) 400402
3. Bhasin, M. and Raghava, G.: GPCRpred: an SVM-based method for prediction of
families and subfamilies of G-protein coupled receptors. Nucleic Acids Research 32
(2004) 383389
4. Brazma, A. et al.: Discovering patterns and subfamilies in biosequences. Proceed-
ings of the Fourth International Conference on Intellignent Systems for Molecular
Biology (ISMB-96), AAAI Press (1996) 3443
Pratt 2.1 software is available at www.ebi.ac.uk/pratt
5. Byvatov, E. and Schneider, G.: Support vector machine applications in bioinfor-
matics. Appl. Bioinformatics 2 (2003) 6777
6. Chalmers, D.T. and Behan, D.P.: The Use of Constitutively Active GPCRs in Drug
Discovery and Functional Genomics. Nature Reviews, Drug Discovery 1 (2002)
599-608
7. Chang, C.C. and Lin, C.J.: LIBSVM : a library for support vector machines. (2001)
LIBSVM software is available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
8. Chou, K.C.: A Novel Approach to Predicting Protein Structural Classes in a (ZO-l)-
D Amino Acid Composition Space. PROTEINS: Structure, Function, and Genetics
21 (1995) 319344
9. Elrod, D.W. and Chou, K.C.: A study on the correlation of G-protein-coupled
receptor types with amino acid composition. Protein Eng. 15 (2002) 713715
10. Horn, F. et al.: GPCRDB: an information system for G protein coupled receptors.
Nucleic Acids Res. 26 (1998) 275279
Available at: www.gpcr.org/7tm
11. Hsu, C.W. et al.: A Practical Guide to Support Vector Classication. Image, Speech
and Intelligent Systems (ISIS) Seminars. (2004)
12. Jonassen, I. et al.: Finding exible patterns in unaligned protein sequences. Protein
Sci. 4 (1995) 15871595
13. Karchin, R. et al.: Classifying G-protein coupled receptors with support vector
machines. Bioinformatics 18 (2001) 147159
14. Keerthi, S.S. and Lin, C.J.: Asymptotic behaviors of support vector machines with
Gaussian kernel. Neural Computation 15 (2003) 1667-1689
15. Krzysztof, P. et al.: Crystal Structure of Rhodopsin: A G- Protein-Coupled Recep-
tor. Science 4 (2000) 739745
16. Lin, H.T. and Lin, C.J.: A study on sigmoid kernels for SVM and the train ing of
nonPSDkernels by SMO type methods. Technical report, Department of Computer
Science and Information Engineering, National Taiwan University. (2003)
Available at http://www.csie.ntu.edu.tw/ cjlin/papers/tanh.pdf
17. Mulder, N.J. et al.: The InterPro Database - 2003 brings increased coverage and
new features. Nucleic Acids Research 31 (2003) 315318
18. Neuwald, A. and Green, P.: Detecting Patterns in Protein Sequences. J. Mol. Biol.
239 (1994) 698-712
19. Otaki, J.M. and Firestein, S.: Length analyses of mammalian g-protein-coupled
receptors. J. Theor. Biol. 211 (2001) 77100
20. Pearson, W. and Lipman, D.: Improved tools for biological sequence analysis. Pro-
ceedings of National Academic Science 85 (1988) 2444-2448
12 B. Bakir and O.U. Sezerman
21. Quinlan, J.R.: C4.5; Programs for Machine Learning, Morgan Kauman Publish-
ers, San Mateo, CA (1988)
22. Sadka, T. and Linial, M.: Families of membranous proteins can be characterized
by the amino acid composition of their transmembrane domains. Bioinformatics
21 (2005) 378386
23. Schoneberg, T. et al.: The structural basis of g-protein-coupled receptor function
and dysfunction in human diseases. Rev Physiol Biochem Pharmacol. 144 (2002)
143227
24. Sreekumar, K.R. et al.: Predicting GPCR-G-Protein coupling using hidden Markov
models. Bioinformatics 20 (2004) 34903499
Swiss-Prot database (Release 46.4, 2005) is available at http:// www.expasy.org
25. Tusndy, G.E. and Simon, I.: Principles Governing Amino Acid Composition of
Integral Membrane Proteins: Applications to topology prediction. J. Mol. Biol.
283 (1998) 489506
26. Tusndy, G.E. and Simon, I.: The HMMTOP transmembrane topology prediction
server. Bioinformatics 17 (2001) 849850
Available at: http://www.enzim.hu/hmmtop
27. Vapnik, V. The Nature of Statistical Learning Theory, Springer-Verlag, New York.
(1995)
28. Vert,J.P. Introduction to Support Vector Machines and applications to computa-
tional biology.(2001)
29. Vilo, J. et al.: Prediction of the Coupling Specicity of G Protein Coupled Recep-
tors to their G Proteins. Bioinformatics 17 (2001) 174181
30. Yang, Z.R.: Biological applications of support vector machines. Brief. Bioinform.
5 (2004) 328338
31. Ying, H. and Yanda, L.: Classifying G-protein Coupled Receptors with Support
Vector Machine. Advances in Neural Network (ISNN 2004), Springer LNCS. 3174
(2004) 448452
32. Ying, H. et al.: Classifying G-protein Coupled receptors with bagging classication
tree. Computational Biology and Chemistry 28 (2004) 275280
Incorporating Biological Domain Knowledge
into Cluster Validity Assessment
1 Introduction
Over the past few years DNA microarrays have become a key tool in functional
genomics. They allow monitoring the expression of thousands of genes in parallel
over many experimental conditions (e.g. tissue types, growth environments). This
technology enables researchers to collect signicant amounts of data, which need
to be analysed to discover functional relationships between genes or samples. The
results from a single experiment are generally presented in the form of a data
matrix in which rows represent genes and columns represent conditions. Each
entry in the data matrix is a measure of the expression level of a particular gene
under a specic condition.
A central step in the analysis of DNA microarray data is the identication of
groups of genes and/or conditions that exhibit similar expression patterns. Clus-
tering is a fundamental approach to classifying expression patterns for biological
and biomedical applications. The main assumption is that genes that are con-
tained in a particular functional pathway should be co-regulated and therefore
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 1322, 2006.
c Springer-Verlag Berlin Heidelberg 2006
14 N. Bolshakova, F. Azuaje, and P. Cunningham
should exhibit similar patterns of expression [1]. A great variety of clustering al-
gorithms have been developed for gene expression data. The next data analysis
step is to integrate these numerical analyses of co-expressed genes with biolog-
ical function information. Many approaches and tools have been proposed to
address this problem at dierent processing levels. Some methods, for example,
score whole clustering outcomes or specic clusters according to their biological
relevance, other techniques aim to estimate the signicance of over-represented
functional annotations, such as those encoded in the Gene Ontology (GO), in
clusters [2], [3], [4], [5]. Some approaches directly incorporate biological knowl-
edge (e.g. functional, curated annotations) into the clustering process to aid in
the detection of relevant clusters of co-expressed genes involved in common pro-
cesses [6], [7]. Several tools have been developed for ontological analysis of gene
expression data (see review by Khatri and Draghici [8], for instance) and more
tools are likely to be proposed in the future.
The prediction of the correct number of clusters in a data set is a funda-
mental problem in unsupervised learning. Various cluster validity indices have
been proposed to measure the quality of clustering results [9], [10]. Recent stud-
ies conrm that there is no universal pattern recognition and clustering model
to predict molecular proles across dierent datasets. Thus, it is useful not to
rely on one single clustering or validation method, but to apply a variety of
approaches. Therefore, combination of GO-based (knowledge-driven) validation
and microarray data (data-driven) validation methods may be used for the es-
timation of the number of clusters. This estimation approach may represent a
useful tool to support biological and biomedical knowledge discovery.
We implemented a knowledge-driven cluster validity assessment system for
microarray data clustering. Unlike traditional methods that only use (gene ex-
pression) data-derived indices, our method consists of validity indices that incor-
porate similarity knowledge originating from the GO and a GO-driven annota-
tion database. We used annotations from the Saccharomyces Genome Database
(SGD) (October 2005 release of the GO database). A traditional node-counting
method proposed by Wu and Palmer [11] and an information content technique
proposed by Resnik [12] were implemented to measure similarity between genes
products. These similarity measurements have not been implemented for clus-
tering evaluation by other research.
The main objective of this research is to assess the application of knowledge-
driven cluster validity methods to estimate the number of clusters in a known
data set derived from Saccharomyces cerevisiae.
S Smin
C= (8)
Smax Smin
where S, Smin , Smax are calculated as follows. Let p be the number of all pairs
of samples (conditions) from the same cluster. Then S is the sum of distances
between samples in those p pairs. Let P be a number of all possible pairs of
samples in the dataset. Ordering those P pairs by distances we can select p pairs
with the smallest and p pairs with the largest distances between samples. The
sum of the p smallest distances is equal to Smin , whilst the sum of the p largest
is equal to Smax . From this formula it follows that the nominator will be small if
pairs of samples with small distances are in the same cluster. Thus, small values
of C correspond to good clusters. We calculated distances using the knowledge-
driven methods described above. The number of clusters that minimize C-index
is taken as the optimal number of clusters, c.
d(w, x) > d(y, z), w and x are in dierent clusters and y and z are in the
same cluster.
By contrast, a quadruple is called disconcordant if one of following two con-
ditions is true:
d(w, x) < d(y, z), w and x are in dierent clusters and y and z are in the
same cluster.
d(w, x) > d(y, z), w and x are in the same cluster and y and z are in dierent
clusters.
We adapted this method by calculating distances using the knowledge-driven
methods described above.
A good partition is one with many concordant and few disconcordant quadru-
ples. Let Ncon and Ndis denote the number of concordant and disconcordant
quadruples, respectively. Then the Goodman-Kruskal index, GK, is dened as:
Ncon Ndis
GK = (9)
Ncon + Ndis
Large values of GK are associated with good partitions. Thus, the number of
clusters that maximize the GK index is taken as the optimal number of clusters, c.
5 Results
The clustering algorithm was applied to produce dierent partitions consisting
of 2 to 6 clusters each. Then, the validity indices were computed for each of
these partitioning results. The two GO-based similarity assessment techniques
introduced above were used for all cases to calculate biological distances between
the genes.
Tables 1 to 4 show the predictions made by the validity indices at each number
of clusters. Bold entries represent the optimal number of clusters, c, predicted
by each method. In the tables the rst cluster validity index approach processes
overall GO-based similarity values, which are calculated by taking into account
the combined annotations originating from the three GO hierarchies. The other
indices are based on the calculation of independent similarity values, indepen-
dently obtained from each of the GO hierarchies.
The C-indices based on Resnik similarity measurement and similarity informa-
tion from the MF, BP and the combined hierarchies indicated that the optimal
Table 3. Goodman-Kruskal index values used Wu and Palmers similarity metric for
expression clusters originating from yeast data
Table 4. Goodman-Kruskal index values used Resniks similarity metric for expression
clusters originating from yeast data
6 Accompanying Tool
The approaches described in this paper are available as part of the Machaon
CVE (Clustering and Validation Environment) [10]. This software platform has
been designed to support clustering-based analyses of expression patterns in-
cluding several data- and knowledge-driven cluster validity indices. The pro-
20 N. Bolshakova, F. Azuaje, and P. Cunningham
7 Conclusion
This paper presented an approach to assessing cluster validity based on similar-
ity knowledge extracted from the GO and GO-driven functional databases. A
knowledge-driven cluster validity assessment system for microarray data cluster-
ing was implemented. Edge-counting and information content approaches were
implemented to measure similarity between genes products based on the GO.
Edge-counting approach calculates the distance between the nodes associated
with these terms in a hierarchy. The shorter the distance, the higher the simi-
larity. The limitation is that it heavily relies on the idea that nodes and links in
the GO are uniformly distributed.
The research applies two methods for calculating cluster validity indices. The
rst approach process overall similarity values, which are calculated by taking
into account the combined annotations originating from the three GO hierar-
chies. The second approach is based on the calculation of independent similarity
values, which originate from each of these hierarchies. The advantage of our
method compared to other computer-based validity assessment approaches lies
in the application of prior biological knowledge to estimate functional distances
between genes and the quality of the resulting clusters. This study contributes
to the development of techniques for facilitating the statistical and biological
validity assessment of data mining results in functional genomics.
It was shown that the applied GO-based cluster validity indices could be
used to support the discovery of clusters of genes sharing similar functions. Such
clusters may indicate regulatory pathways, which could be signicantly relevant
to specic phenotypes or physiological conditions.
Previous research has successfully applied C-index using knowledge-driven
methods (GO-based Resnik similarity measure) [15] to estimate the quality of
the clusters.
Future research will include the comparison and combination of dierent data-
and knowledge-driven cluster validity indices. Further analyses will comprise,
for instance, the implementation of permutation tests as well as comprehensive
cluster descriptions using signicantly over-represented GO terms.
The results contribute to the evaluation of clustering outcomes and the iden-
tication of optimal cluster partitions, which may represent an eective tool to
support biomedical knowledge discovery in gene expression data analysis.
Acknowledgements
This research is partly based upon works supported by the Science Foundation
Ireland under Grant No. S.F.I.-02IN.1I111.
Incorporating Biological Domain Knowledge 21
References
1. Fitch, J., Sokhansanj, B.: Genomic engineering: Moving beyond DNA. Sequence
to function. Proceedings of the IEEE 88 (2000) 19491971
2. Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological
relevance. Bioinformatics 19 (2003) 23812389
3. Lee, S., Hur, J., Kim, Y.: A graph-theoretic modeling on go space for biological
interpretation on gene clusters. Bioinformatics 20 (2004) 381388
4. Goeman, J., van de Geer, S., de Kort, F., van Houwelingen, H.: A global test for
groups of genes: testing association with a clinical outcome. Bioinformatics 20
(2004) 9399
5. Raychaudhuri, S., Altman, R.: A literature-based method for assessing the func-
tional coherence of a gene group. Bioinformatics 19 (2003) 396401
6. Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological net-
works and gene expression data. Bioinformatics 18 (2002) S145S154
7. Sohler, F., Hanisch, D., Zimmer, R.: New methods for joint analysis of biological
networks and expression data. Bioinformatics 20 (2004) 15171521
8. Khatri, P., Draghici, S.: Ontological analysis of gene expression data: current tools,
limitations, and open problems. Bioinformatics 21 (2005) 35873595
9. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression
data. Signal Processing 83 (2003) 825833
10. Bolshakova, N., Azuaje, F.: Machaon CVE: cluster validation for gene expression
data. Bioinformatics 19 (2003) 24942495
11. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd Annual Meeting
of the Association for Computational Linguistics, New Mexico State University, Las
Cruces, New Mexico (1994) 133138
12. Resnik, P.: Using information content to evaluate semantic similarity in a tax-
onomy. In: Proceedings of the 14th International Joint Conference on Articial
Intelligence (IJCAI). (1995) 448453
13. Azuaje, F., Bodenreider, O.: Incorporating ontology-driven similarity knowledge
into functional genomics: an exploratory study. In: Proceedings of the fourth IEEE
Symposium on Bioinformatics and Bioengineering (BIBE 2004). (2004) 317324
14. Wang, H., Azuaje, F., Bodenreider, O., Dopazo, J.: Gene expression correlation
and gene ontology-based similarity: An assessment of quantitative relationships.
In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in
Bioinformatics and Computational Biology, La Jolla-California, IEEE Press (2004)
2531
15. Bolshakova, N., Azuaje, F., Cunningham, P.: A knowledge-driven approach to
cluster validity assessment. Bioinformatics 21 (2005) 25462547
16. Speer, N., Spieth, C., Zell, A.: A memetic clustering algorithm for the functional
partition of genes based on the Gene Ontology. In: Proceedings of the 2004 IEEE
Symposium on Computational Intelligence in Bioinformatics and Computational
Biology (CIBCB 2004), IEEE Press (2004) 252259
17. Speer, N., Spieth, C., Zell, A.: Functional grouping of genes using spectral cluster-
ing and gene ontology. In: Proceedings of the IEEE International Joint Conference
on Neural Networks (IJCNN 2005), IEEE Press (2005) 298303
18. Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka,
L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D., Davis, R.: A
genomewide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2
(1998) 6573
22 N. Bolshakova, F. Azuaje, and P. Cunningham
19. Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy.
British Journal of Mathematical and Statistical Psychologie (1976) 190241
20. Goodman, L., Kruskal, W.: Measures of associations for cross-validations. Journal
of Ameracan Statistical Association (1954) 732764
A Novel Mathematical Model for the Optimization of
DNAChip Design and Its Implementation
1 Introduction
The rapid development of nanotechnology and computer science led to the emergence
of a new research field, called bioinformatics. One of the most important technique of
this new discipline is DNAchip or DNAmicroarray. It also represents a revolutionary
innovation in the area of applied medical diagnostics. With the help of it the presence of
pathogens and a predominant proportion of genetically based diseases can be detected
parallel very quickly and accurately. In other words, it can extremely accelerate the
precise diagnostics, so the appropriate treatment can be started earlier.
Nevertheless, the more extensive use of the method is nowadays limited by its high
operational costs. For example the production of 80 homogeneous chips with 20,000
DNA fragments costs approximately e 40,000. The largest part of these expenses re-
sults from the production of the socalled masks, which are used to determine the chem-
ical structure of DNA pieces. Obviously, the large manufacturing costs mean substantial
disadvantage. Therefore, the designing process needs special attention. The aim of our
work is to facilitate the engineering process and help to avoid extra charges which are
due to chemicals carrying nonappropriate genetical information.
We proceed as follows: In section 2 a short introduction to DNAchip technology
and Simulated Annealing method is given. The estimation of the hybridization is sum-
marized in section 3. Our mathematical model is presented together with its optimiza-
tion in section 4. The test results can be found in section 5, and conclusion with future
work plans in section 6.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 2333, 2006.
c Springer-Verlag Berlin Heidelberg 2006
24 K. Danyi, G. Kkai, and J. Csontos
2 DNAChips
DNAchip itself is a solid carrier (glass or special plastic plate) with a set of single
stranded DNA fragments, socalled probes on it. The sample is usually a solution which
also contains DNA fragments. According to Watson and Crick a DNA molecule consists
of two helically twisted strands connected together with a series of hydrogen bonds
(doublehelix) and each strand has 4 distinct building blocks (nucleotides or bases),
adenine (dA), guanine (dG), cytosine (dC) and thymine (dT). Generally, the different
basis sequences determine different genes, which are large DNA fragments, carry and
transmit genetic information. The smaller DNA pieces are called oligonucleotides. En-
ergetically favorable, if dA bonds to dT and dC pairs with dG, which means there exists
a one to one correspondence in the set of DNA fragments. There is another important bi-
jective function f: DNA > RNA, which is based on similar chemically preferred pairs,
dArU, dTrA, dCrG and dGrC where RNA fragments are variations with repetitive rA,
rU, rC and rG elements. The Central Dogma of Molecular Biology, which is the law
of genetic information storage and transfer in living organisms is based on these spe-
cial relationships (Figure 1). Indeed, if one investigate a sample using a DNAchip then
only the unique complementary basis sequences ought to form doublehelixes. This
process is known as hybridization. Nevertheless, every molecular process observed at
macroscopic level is stochastic. Depending on the basis sequences it is possible to arise
some non exactly complementary distorted doublehelix. The energetically less stable
basis pairs are the so called mismatches (MM). MMs and mutations are closely related,
MMs can alter into mutations and with the help of them even a single mutation can be
detected. Since we can select the probe sequences, it is theoretically possible to indicate
all wellknown and unknown mutations in the sample. DNAchip technology takes the
advantages of parallel experiments, even up to 100,000 different oligonucleotides can
be applied on a single chip and the investigation takes only a few minutes. Based on the
application area chips can be divided into three major categories:
All three kinds of DNAchips serve to detect the presence or absence of certain nu-
cleotide chains in the analyzed sample. Although the philosophy is basically the same,
there are some substantial differences among the groups. In our point of view the most
remarkable one is the role of mutations, which is the most important in the first group.
There are several different DNAchip manufacturing techniques have been devel-
oped during the last decade. In the case of diagnostic chips, the most frequently applied
procedure is very similar to those used in computer chip fabrication [1]. Photolitho-
A Novel Mathematical Model for the Optimization of DNAChip Design 25
graphic processes such as photosensitive masks are combined with solid phase chem-
istry to bond DNA fragments onto the surface. With the series of specific masks and
chemical steps high density chips can be constructed. In the development, application
and propagation of DNAchips S. Fodor played a decisive role [2, 3, 4].
probability of hybridization and the Tm point are descriptors of the same thermodynam-
ical property: the stability of the duplex. The higher the Tm the more likely hybridiza-
tion occurs. To calculate Tm the simpler methods use the chemical formula of the chains
(eq.1.) and empirical parameters are accounted for solvation effects (eq. 2.).
where L, F are empirical parameters, XG , XC can be simply calculated from #G and #C.
In the most complicated and flexible case the actual sequence of the chains is also taken
into consideration (eq.3.) [5, 6].
H0
Tm = 273.15 + const log[Na+ ], (3)
S0 + R ln[c/4]
where H0 is the enthalpy, S0 the entropy and c the oligo concentration. H0 and S0
depend on the basesequences and 12 constants (const) stated by the examination [7, 8].
Although, there is a scientific dispute [9, 10, 11, 12] about the best parameter set the
nearest neighbor (NN) method (eq. 3.) is generally considered to be the most accurate
prediction of Tm . Nevertheless, substantial problems have been left unsolved consid-
ering NN. First of all, the NN parameters are not valid in solid phase, they are based
on fluid phase measurements. Secondly, they originally were developed to describe the
thermodynamics of perfectmatching sequences and their extension to other cases is
painful, there still exist quite a few mismatches without parameter. Lastly, every pa-
rameter defined by experiments has limited scope. Reparametrization can help to solve
A Novel Mathematical Model for the Optimization of DNAChip Design 27
these problems, but it requires a tremendous amount of experimental work regarding the
NN approach. In addition, if one consider carefully the last argument, then this is not
else than only a stone on Sisyphuss way. A more effective approach will be presented
in the following sections to avoid these difficulties.
As we mentioned earlier, regarding DNA the most stable state if the four bases can form
WatsonCrick basis pairs: dG dC, dA = dT (where every hyphen means one hydro-
gen bond). If two sequences are exactly complementary then they will hybridize with
each other under appropriate conditions. Since, duplex formation is a stochastic process
hybridization can occur between non perfectly matching sequences and its probability
is in inverse proportion to the number of MMs. Furthermore, one can conclude that dif-
ferent MMs might have different effects on hybridization. Accordingly, the following
parameters are specified in our model:
Table 1. a) The commutative Cayley table of DNA/DNApairs, the off diagonal elements are
MMs and the different ones are in italic; b) The Cayley table of DNA/RNApairs, the off diagonal
elements are MMs
dA dC dG dT DNA/DNA
dA dC dG dT DNA/RNA
dAdA dCdA dGdA dTdA dA
dArA dCrA dGrA dTrA rA
dAdC dCdC dGdC dTdC dC
dArC dCrC dGrC dTrC rC
dAdG dCdG dGdG dTdG dG
dArG dCrG dGrG dTrG rG
dAdT dCdT dGdT dTdT dT
dArU dCrU dGrU dTrU rU
a)
b)
3. Solutiondependent parameters:
They cover the experimental conditions i.e. salt and sample concentrations and
other environmental factors, which also effect the probability of hybridization. The
experimental conditions are specified by the scientist, who plans and accomplishes
the work. One can assume that these values do not change in a certain laboratory.
Although we do not use these parameters explicitly, they are originally involved in
our approach.
In our basic model, we optimize only the typedependent parameters, and set the
other parameters to appropriate values (Section 4.2).
A Novel Mathematical Model for the Optimization of DNAChip Design 29
4 Methods
All the above mentioned parameters represent the chemical sense in our model. With
the help of MMs, which are weighted by the appropriate parameters, the mathematical
model for the estimation of hybridization probability can be presented:
l1
P(hybridization) = max 0, 1 w pos(i) wmm (m(i)) ,
i=0
where l is the length of sequence, i is the position number, m(i) is the type of mismatch
at position i, w pos (i) R is the weight of position i and wmm (m(i)) is the weight of
mismatch type at position i.
If there is no mismatch at the position i, then w pos (i) = wmm (m(i)) = 0. If appropriate
values are assigned to the parameters, experiments can be replaced by computations.
where the element ai j was derived from the chip experiment (TIFF image processing)
and its value based on the hybridization of the row i and column j DNA fragment, n,
m are the dimensions of the chip. The bi j elements of the B matrix are computed as
follows:
l
bi j = f (i j) = w pos(k)wmm (Si jk ),
k=0
where l is the length of sequence, Si j is the sequence on the chip in row i and column
j, Si jk is the MM at position k in this sequence. Thus the elements of matrix B are
calculated from character strings, which consist of the variations of the four indications
(A, C, G, T).
In a real dataset, the proportions of MMs are usually not balanced. In order to pro-
portionately optimize the MM weights, we need to scale the number of the present MMs
and expand the target function. The percent of every mismatch in the dataset is calcu-
lated as follows:
Nm(t) 100
pm(t) = 12 %
t=1 Nm(t)
where m(t) is the mismatch, Nm(t) is the number of m(t) mismatch in the real dataset.
In order to scale the number of MMs considering DNA-RNA chips, the proportion of
12 = 8.3333% (the evenly distributed MMs) and pm(t) is taken, thus the scale weight
100%
0.8
0.6
0.4
0.2
0
0 5 10 15 20
5 Results
Figure 5 shows the differences between average optimum values in the dependence of
the initial test temperature and the number of iterations. The initial test temperature was
increased from 50 to 140 and number of iterations from 10 to 55. It can be seen that the
optimum values increase, if the temperature is between 50 and 70 and the number of
iterations are between 50 and 70. In contrast if the temperature is between 100 and 140
and the number of iterations is 20, the difference between the model and experiment is
acceptable. Since SA is a stochastic search method, we repeated the samples 100 times
for each case.
Fig. 5. The average optimum values generated by the method using different temperature and
iteration number
In Figure 6 and 7 the variation of the parameter values can be observed. Those param-
eters, which are important from the biochemical point of view are represented by grey
columns, the others are in black.
It can be seen that starting form 1.0 at an early stage the values are mixed. How-
ever, with the advance of the optimization process the important and the less notable
parameters separate from each other and the most important ones obtain the highest
weight, eventually. If we take into account the fact that the hybrid RNA/DNA structures
are more stable then the DNA/DNA duplex ones and the base pair cytosine and guanine
stands for a stronger interaction than the double hydrogenbonded adenine and thymine
(or uracil) the following conclusion can be made:
Regarding DNA/RNA hybrid systems, the RNAside mismatches should determine
mainly the stability of the hybridization. The theoretical consideration stays in coher-
ence with the parameters resulted from the optimization process. As you can see in
Figure 7, 5 out of the 6 possible mismatches on the RNAside possess the first 5 posi-
tions based on the weight (dTrC, dArC, dGrG, dArG, dTrG).
32 K. Danyi, G. Kkai, and J. Csontos
Fig. 6. The values of the typedependent parameters in the case of goal function 6.10964 and
4.82974
Fig. 7. The values of the typedepended parameters in the case of goal function 2.42093 and
1.86422
6 Conclusion
The prediction of thermodynamical properties of nucleic acids using computer model-
ing has not been solved yet. The main problem sources are (i) the parameters used in
the computation and determined by time consuming and expensive experiments can be
applied only to fluid phase, (ii) the lack of MM parameters, (iii) the parameters strongly
depend on the experimental conditions (e.g. temperature, solvent, etc.).
We presented a novel theoretical model (in situ in silico approach) to estimate the
hybridization process between DNA/DNA and DNA/RNA strands and eliminate the
previously mentioned defects. With the help of this new method, the in silico optimiza-
tion process takes place in situ the DNAchip laboratory, then using the established
parameters one can model the hybridization, which is the cornerstone of DNAchip
design.
By the computation done so far the implemented simulated annealing method was
used with linear cooling curve. Beside the experimental test of the method, the exponen-
tial and Boltzmannsigmoid cooling scheme are in progress as well as the optimization
of the positiondependent parameters.
A Novel Mathematical Model for the Optimization of DNAChip Design 33
References
1. Chiu, G.T., Shaw, J.: Optical lithography. In: IBM Journal of Research and Development,
Vol.41. (24. Jan. 1997)
2. Chee, M., Yang, R., Hubbel, E., Berno, A., Huang, X., Stern, D., Winkler, J., Lockad, D.,
.Morris, M., SP.Fodor: Accessing genetic information with high-density dna arrays. In:
Science, Vol.274. (25. Oct. 1996) 610613
3. Fodor, S., Rava, R., Huang, X., Pease, A., Holmes, C., Adams, C.: Multiplexed biochemical
assays with biological chips. In: Nature, Vol.364. (5. Aug. 1993) 555556
4. Fodor, S., Read, J., Pissung, M., Stryer, L., Lu, A., Solas, D.: Light-directed, spatially ad-
dressable parallel chemical synthesis. In: Science, Vol.251,No.4995. (15. Feb. 1991) 767
773
5. Panjkovich, A., Melo, F.: Comparison of different melting temperature calculation methods
for short dna sequences. In: Bioinformatics. (March 15, 2005) 21(6): 711 722
6. Panjkovich, A., Norambuena, T., Melo, F.: dnamate: a consensus melting temperature pre-
diction server for short dna sequences. In: Nucleic Acids Res. (July 1, 2005) 33(suppl_2):
W570 W572
7. Lee, I., Dombkowski, A., Athey, B.: Guidelines for incorporating non-perfectly matched
oligonucleotides into target-specific hybridisation probes for a dna microarray. In: Nucleic
Acids Research, Vol. 32. No. 2. (2004) 681690
8. Rouillard, J.M., Zuker, M., Gulari, E.: Oligoarray 2.0: design of oligonucleotide probes for
dna microarrays using a thermodinamic approach. In: Nucleic Acids Research, Vol. 31. No.
12. (2003) 30573062
9. Breslauer, K., Frank, R., Blocker, H., Marky, L.: Predicting dna duplex stability from the
base sequence. In: Proc. Natl. Acad. Sci. USA 83. (1986) 37463750
10. Freier, S., R.Kierzek, Jaeger, J., Sugimoto, N., Caruthers, M., Neilson, T., Turner, D.: Im-
proved parameters for predictions of rna rna duplex stability. In: Proc. Natl. Acad. Sci. 83.
(1986) 93739377
11. Wallace, R., Shaffer, J., Murphy, R., Bonner, J., Hirose, T., Itakura, K.: Hybridization of syn-
thetic oligodeoxyribonucleotides to phi chi 174 dna: the effect of single base pair mismatch.
In: Nucleic Acids Res.6. (1979) 35433557
12. Howley, P., Israel, M., Law, M., Martin, M.: A rapid method for detecting and mapping
homology between heterologous dnas. evaluation of polyomavirus genomes. In: JBC, Vol.
254. (1979) 48764883
13. Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimization by simulated annealing. In: Science,
Vol.220,No.4598. (May 1983) 671680
14. Berendsen, H., van der Spoel, D., van Drunen, R.: Gromacs: A message passing parallel md
implementation. In: Comp. Phys. Comm. 91. (1995) 4356
15. Sanbonmatsu, K.Y., Joseph, S., Tung, C.S.: Simulating movement of trna into the ribosome
during decoding. In: PNAS 2005 102. (2005) 1585415859
16. Inc., A.S.: Insight ii molecular modeling and simulation enviornment (2005)
17. for Macromolecular Modeling, N.R., Bioinformatics: Scalable molecular dynamics (2005)
A Hybrid GA/SVM Approach for Gene
Selection and Classification of Microarray Data
1 Introduction
The DNA Microarray technology allows measuring simultaneously the expres-
sion level of a great number of genes in tissue samples. A number of works have
studied classication methods in order to recognize cancerous and normal tissues
by analyzing Microarray data [1, 8, 2]. The Microarray technology typically pro-
duces large datasets with expression values for thousands of genes (200020000)
in a cell mixture, but only few samples are available (2080).
From the classication point of view, it is well known that, when the number
of samples is much smaller than the number of features, classication methods
may lead to data overtting, meaning that one can easily nd a decision func-
tion that correctly classies the training data but this function may behave very
poorly on the test data. Moreover, data with a high number of features require
inevitably large processing time. So, for analyzing Microarray data, it is neces-
sary to reduce the data dimensionality by selecting a subset of genes that are
relevant for classication.
In the last years, many approaches, in particular various Genetic Algorithms
(GAs) and Support Vector Machines (SVMs), have been successfully applied
to Microarray data analysis [6, 19, 16, 10, 15, 17, 18, 13]. In Section 3, we review
some of the most popular approaches.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 3444, 2006.
c Springer-Verlag Berlin Heidelberg 2006
A Hybrid GA/SVM Approach for Gene Selection 35
2 Datasets
In this study, we use two well-known public datasets, the Leukemia dataset
and the Colon cancer dataset. All samples were measured using high-density
oligonucleotide arrays [2].
The Leukemia dataset1 consists of 72 Microarray experiments (samples) with
7129 gene expression levels. The problem is to distinguish between two types of
Leukemia, Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia
(ALL). The complete dataset contains 25 AML samples of and 47 ALL samples.
As in other experiments [8], 38 out of 72 samples are used as training data (27
ALL samples and 11 AML samples) and the remaining samples (20 ALL samples
and 14 AML samples) are used as test data.
The Colon cancer dataset2 contains the expression of 6000 genes with 62 cell
samples taken from colon cancer patients, but only 2000 genes were selected
based on the condence in the measured expression levels [1]. 40 of 62 samples
are tumor samples and the remaining samples (22 of 62) are normal ones. In
this paper, the rst 31 out of 62 samples were used as training data and the
remainder samples as test data.
1
Available at: http://www.broad.mit.edu/cgi-bin/cancer/publications/.
2
Available at: http://microarray.princeton.edu/oncology/aydata/index.html.
36 E.B. Huerta, B. Duval, and J.-K. Hao
Feature selection for classication is a very active research topic since many ap-
plication areas involve data with tens of thousands of variables [9]. This section
concerns more specically a literature review of previous studies on feature selec-
tion and classication of Microarray Data, with a special focus on the Leukemia
and the Colon datasets presented in Section 2.
Feature selection can be seen as a typical combinatorial problem. Informally,
given a dataset described by a large number of features, the aim is to nd out,
within the space of feature subsets, the smallest subset that leads to the highest
rate of correct classication. Given the importance of feature selection, many
solution methods have been developed. Roughly speaking, existing methods for
feature selection belong to three main families [9]: the lter approach, the wrap-
per approach and the embedded approach.
The lter methods separate the feature selection process from the classica-
tion process. These methods select feature subsets independently of the learning
algorithm that is used for classication. In most cases, the selection relies on
an individual evaluation of each feature [8, 6], therefore the interactions between
features are not taken into account.
In contrast, the wrapper approach relies on a classication algorithm that is
used as a black box to evaluate each candidate subset of features; the quality
of a candidate subset is given by the performance of the classier obtained on
the training data. Wrapper methods are generally computation intensive since
the classier must be trained for each candidate subset. Several strategies can
be considered to explore the space of possible subsets. In particular, in [14], evo-
lutionary algorithms are used with a k-nearest neighbor classier. In [12], the
author develops parallel genetic algorithms using adaptive operators. In [18], one
nds a SVM wrapper with a standard GA. In [20], the selection-classication
problem is treated as a multi-objective optimization problem, minimizing si-
multaneously the number of genes (features) and the number of misclassied
examples.
Finally, in embedded methods, the process of selection is performed during the
training of a specic learning machine. A representative work of this approach is
the method that uses support vector machines with recursive feature elimination
(SVM/RFE) [10]. The selection is based on a ranking of the genes and, at each
step, the gene with the smallest ranking criterion is eliminated. The ranking
criterion is obtained from the weights of a SVM trained on the current set of
genes. In this sense, embedded methods are an extension of the wrapper models.
There are other variants of these approaches, see [21, 7] for two examples.
The work reported in this paper is based on a hybrid approach combining fuzzy
logic, GA and SVM. Our general model may be characterized as a three-stage
sequential process, using complementary techniques to shrink (or reduce) grad-
A Hybrid GA/SVM Approach for Gene Selection 37
ually the search space. The rest of this section gives a brief description of these
three stages.
Stage 1 Pre-processing by fuzzy logic. This stage aims to reduce the di-
mension of the initial problem by eliminating gene redundancy. This stage is ba-
sically composed of four steps. First, the gene expression levels are transformed
into fuzzy subsets with Gaussian representations. Second, the Cosine amplitude
method is employed to assess fuzzy similarities between genes. We build a simi-
larity matrix that is then transformed to a matrix of fuzzy equivalence relations
by dierent compositions. Third, using cuts [23] with decreasing values of ,
we obtain groups of similar genes that correspond to fuzzy equivalence classes of
genes. Fourth, for each group, one gene is randomly taken as the representative
of the group and other genes of the group are ignored. Applying this dimension
reduction technique to the datasets presented in Section 2, the set of 7129 genes
for Leukemia (2000 genes for Colon respectively) is reduced to 1360 genes (943
genes respectively). Therefore, the search space is dramatically reduced. As we
show later in Section 6, with this reduced set of genes, we will be able to ob-
tain high quality classication results. A detailed description of this stage goes
beyond the scope of this paper and can be found in [3].
Stage 2 Gene subset selection by GA/SVM. From the reduced set of genes
obtained in the previous pre-processing stage, this second stage uses a wrapper
approach that combines a GA and a SVM to accomplish the feature (gene)
subset selection. The basic idea here consists in using a GA to discover good
subsets of genes, the goodness of a subset being evaluated by a SVM classier
Subset of Analysis of
selected genes frequency of
genes
Stop no
Evaluation condition Evolution
Population (SVM)
yes
Best solution
Fig. 1. The general process for gene subset selection and classication using GA/SVM:
Gene subset selection (Stage 2 - top); Gene selection and classication (Stage 3 -
bottom)
38 E.B. Huerta, B. Duval, and J.-K. Hao
on a set of training data (see Section 2). During this stage, high quality gene
subsets are recorded to an archive in order to be further analyzed.
At the end of the GA, the analysis of the archived gene subsets is performed:
gene subsets are compared among them and the most frequently appearing genes
are identied. This process typically leads to a further reduced set of genes (<100
genes for the Leukemia and Colon dataset). Fig.1 (top) shows a general picture
of this stage.
average accuracy (rate of correct classication) of a SVM trained with this gene
subset [11]. The LOOCV procedure means that one sample from the dataset
is considered as a test case while a SVM is trained on all the other samples,
and this evaluation is repeated for each sample. So for each chromosome x,
F itness(x) = accuracySV M (x).
One of the key elements of a SVM classier concerns the choice of its kernel.
In our study, we have chosen to use the RBF kernel. We also experimented
Gaussian and polynomial kernels. For polynomial kernels, the main diculty is
to determine an appropriate polynomial degree while the results we obtained
with the Gaussian kernel are not satisfactory. Notice that RBF has been used
in several previous studies for Microarray data classication [4, 18, 5].
The GA parameters used in our model of gene subset selection for the
Leukemia and Colon datasets are shown in Tables 1 and 2. For the SVM clas-
sier, the same parameters settings are used in the two stages of gene subset
selection and classication. The normalization parameter C is xed at 100 and
the control parameter for the RBF kernel of SVM is xed to 0.5. Notice that
3
http://theoval.sys.eua.uk/gcc/svm/toolbox
A Hybrid GA/SVM Approach for Gene Selection 41
given the input data used by the GA/SVM are already normalized during the
Fuzzy Logic pre-processing, the normalization parameter C has in fact little
inuence in our case.
Methods
Dataset GA&SVM [6] [25] [18] [5] [20] [10]
Leukemia 100(25) 94.10(-) 100(8) 100(6) 95.0(-) 100(4) 100(2)
Colon 99.41(10) 90.30(-) 91.9(3) 93.55(12) 91.0(-) 97.0(7) 98.0(4)
42 E.B. Huerta, B. Duval, and J.-K. Hao
of our results shows that several biologically signicant genes reported in [8] are
found by our approach.
Table 4 shows the detailed results of 5 independent runs of our GA/SVM
algorithm. As it can be observed, these results are quite stable. For the Leukemia
dataset, each of the 5 runs obtains a classication rate of 100% while for the
Colon dataset, the best run gives a classication rate of 99.64. Even the worst
obtains a classication rate of 97.88.
7 Conclusions
In this paper, we presented a general approach for gene selection and classi-
cation of high dimensional DNA Microarray data. This approach begins with a
fuzzy logic based pre-processing technique that aims to cope with the imprecise
nature of the expression levels and to reduce the initial dimension of the input
dataset. Following this pre-processing stage, a hybrid wrapper system combin-
ing a Genetic Algorithm with a SVM classier is used to identify potentially
predictive gene subsets that are then used to carry out the nal gene selection
and classication tasks. Another important feature of our approach concerns the
introduction of an archive of high quality solutions, which allows limiting the
GA/SVM exploration to a set of frequently appearing genes.
This approach was experimentally evaluated on the widely studied Leukemia
and Colon cancer datasets and compared with six previous methods. The re-
sults show that our approach is able to obtain very high classication accuracy.
In particular, to our knowledge, this is the rst time that a averaged correct
classication rate of 99.41% (with 10 genes) is reached for the Colon dataset.
This approach can be further improved on several aspects. First, we notice
that our method does not provide the smallest number of genes on the Leukemia
data. This is due to the fact that the GA is only guided by the criterion of
classication accuracy. Therefore, the criterion of the number of genes should be
integrated into the tness function. This can be achieved by an aggregated tness
function or a bi-criteria evaluation. Second, the high computation time required
in stage 2 can be reduced by the use of a faster classier (or an approximate
tness function). For example, the m-features operator reported in [22] may
be considered. Also, a ne-tuning of SVM parameters in stage 3 may lead to
improved results. Finally, we intend to apply our approach to other DNA chip
data and to study the behavior of our model.
Genopole Program. We thank the reviewers of the paper for their very helpful
comments.
References
1. U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J
Levine. Broad patterns of gene expression revealed by clustering analysis of tumor
and normal colon tissues probed by oligonucleotide arrays. In Proc. Natnl. Acad.
Sci. USA, volume 96, 1999.
2. A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini.
Tissue classication with gene expression proles. Journal of Computational Biol-
ogy, 7(3-4):559583, 2000.
3. E. Bonilla Huerta, B. Duval, and J.K. Hao. Feature space reduction of large scale
gene expression data using Fuzzy Logic. Technical Report, LERIA, University of
Angers, January 2006.
4. M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, S.W. Sugnet, T.S. Furey, M.
Ares Jr., and D. Haussler. Knowledge-based analysis of microarray gene expression
data by using support vector machines Proc. Natl. Acad. Sci. U S A., 97(1): 262
267, 2000.
5. S. Chao and C. Lihui Feature dimension reduction for microarray data analysis
using locally linear embedding. In APBC, pages 211217, 2005.
6. T. S. Furey, N. Cristianini, N. Duy, D.W. Bednarski, M. Schummer, and D. Haus-
sler. Support vector machine classication and validation of cancer tissue samples
using microarray expression data. Bioinformatics, 16(10):906914, 2000.
7. L. Goh, Q. Song, and N. Kasabov. A novel feature selection method to improve
classication of gene expression data. In Proceedings of the Second Asia-Pacic
Conference on Bioinformatics, pages 161166, Australian Computer Society, Dar-
linghurst, Australia, 2004.
8. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov,
H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomeld, and E. S.
Lander. Molecular classication of cancer: Class discovery and class prediction by
gene expression monitoring. Science, 286:531537, 1999.
9. I. Guyon and A. Elissee. An introduction to variable and feature selection. Journal
of Machine Learning Research, 3:11571182, 2003.
10. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classi-
cation using support vector machines. Machine Learning, 46(1-3):389422, 2002.
11. T. Joachims Estimating the Generalization Performance of a SVM Eciently.
Proceedings of the International Conference on Machine Learning (ICML), Morgan
Kaufman, 2000.
12. L. Jourdan. Metaheuristics for knowledge discovery : Application to genetic data
(in French). PhD thesis, University of Lille, 2003.
13. K-J. Kim and S-B. Cho. Prediction of colon cancer using an evolutionary neural
network. Neurocomputing (Special Issue on Bioinformatics), 61:361379, 2004.
14. L. Li, C. R. Weinberg, T.A. Darden, and L.G. Pedersen. Gene selection for sam-
ple classication based on gene expression data: study of sensitivity to choice of
parameters of the GA/KNN method. Bioinformatics, 17(12):11311142, 2001.
15. J. Liu and H. Iba. Selecting informative genes using a multiobjective evolutionary
algorithm. In Proc. of Congress on Evolutionary Computation (CEC02), pages
297302, 2002.
44 E.B. Huerta, B. Duval, and J.-K. Hao
16. F. Markowetz, L. Edler, and M. Vingron. Support vector machines for protein fold
class prediction. Biometrical Journal, 45(3):377389, 2003.
17. S. Mukherjee. Classifying Microarray Data Using Support Vector Machines.
Springer-Verlag, Heidelberg, 2003.
18. S. Peng, Q. Xu, X.B. Ling, X. Peng, W. Du, and L. Chen. Molecular classication
of cancer types from microarray data using the combination of genetic algorithms
and support vector machines. FEBS Letter, 555(2):358362, 2003.
19. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo,
C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S.
Lander, and T.R. Golub. Multiclass cancer diagnosis using tumor gene expression
signatures. Proc. Natl. Acad. Sci. U S A., 98(26):1514915154, 2001.
20. A. R. Reddy and K. Deb. Classication of two-class cancer data reliably using
evolutionary algorithms. Technical Report. KanGAL, 2003.
21. Y. Saeys, S. Aeyels Degroeve, D. Rouze, and Y. P. Van de Peer. Feature selection
for splice site prediction: A new method using eda-based feature ranking. BMC
Bioinformatics, 5-64, 2004.
22. S. Salcedo-Sanz, F. Prez-Cruz, G. Campsand, and C. Bousoo-Calzn. Enhanc-
ing genetic feature selection through restricted search and Walsh analysis. IEEE
Transactions on Systems, Man and Cybernetics, Part C, 34:398406, 2004.
23. T.J. Ross. Fuzzy Logic with Engineering Applications. McGraw-Hill, 1997.
24. V. N. Vapnik. Statistical Learning Theory. Wiley N.Y., 1998.
25. Y. Wang, F. Makedon, J.C. Ford, and J.D. Pearlman. Hykgene: a hybrid approach
for selecting marker genes for phenotype classication using microarray gene ex-
pression data. Bioinformatics, 21(8):15301537, 2005.
26. E. Zitzlere, K. Deb, and L. Thiele. Comparison of multiobjective evolutionary
algorithms: Empirical results. Evolutionary Computation 8(2): 173-195, 2000.
Multi-stage Evolutionary Algorithms for Efficient
Identification of Gene Regulatory Networks
Abstract. With the availability of the time series data from the high-throughput
technologies, diverse approaches have been proposed to model gene regulatory
networks. Compared with others, S-system has the advantage for these tasks in
the sense that it can provide both quantitative (structural) and qualitative (dy-
namical) modeling in one framework. However, it is not easy to identify the
structure of the true network since the number of parameters to be estimated is
much larger than that of the available data. Moreover, conventional parameter
estimation requires the time-consuming numerical integration to reproduce dy-
namic profiles for the S-system. In this paper, we propose multi-stage evolu-
tionary algorithms to identify gene regulatory networks efficiently. With the
symbolic regression by genetic programming (GP), we can evade the numerical
integration steps. This is because the estimation of slopes for each time-course
data can be obtained from the results of GP. We also develop hybrid evolution-
ary algorithms and modified fitness evaluation function to identify the structure
of gene regulatory networks and to estimate the corresponding parameters at the
same time. By applying the proposed method to the identification of an artificial
genetic network, we verify its capability of finding the true S-system.
1 Introduction
Although mathematical modeling for the biochemical networks can be achieved at
different level of detail (see [1] and [2] for the reviews of metabolic and genetic regu-
latory networks modeling), we can cluster them into three dominant approaches [3].
One extreme case is mainly intended to describe the pattern of interactions between
the components. Graph-based representation gives us the insight for large architec-
tural features within a cell and allows us to discovery principles of cellular organiza-
tion [4]. However, it is difficult to handle the dynamics of the whole system since
these models are very abstract. The other extreme primarily focuses on describing the
dynamics of the systems by some kinds of equations which can explain the biochemi-
cal interactions with stochastic kinetics [5, 6]. While these approaches lead to realis-
tic, quantitative modeling on cellular dynamics, the application is limited to the small
systems due to their computational complexities.
One of the appropriate approaches for the pathway structure and dynamics identifi-
cation is S-system [7]. It is represented as a system of ordinary differential equations
which have a particular form, where each component process is characterized by
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 45 56, 2006.
Springer-Verlag Berlin Heidelberg 2006
46 K.-Y. Kim, D.-Y. Cho, and B.-T. Zhang
power-law functions. S-system is not only general enough to represent any nonlinear
relationship among the components but also able to be mapped onto the network
structure directly. In spite of these advantages, there is a serious drawback which
prevents the S-system from wildly spreading over the systems biology communities.
Its large number of parameters should be estimated for the small number of observed
dynamic trends. Provided that n components such as genes and proteins are involved
in a certain living system, we must optimize at least 2n(n+1) parameters for the S-
system.
Evolutionary computation has been used from its inception for automatic identifica-
tion of a given system or process [8]. For the S-system models, some evolutionary
search techniques have been proposed [9-12]. However, they require the time-
consuming numerical integrations to reproduce dynamic profiles for the fitness
evaluations. To avoid this problem, Almeida and Voit have employed an artificial
neural network (ANN) to smooth the measured data for obtaining slopes of gene ex-
pression level curves [13]. By comparing the slope of each S-system in the population
with the estimated one from ANN, we can evaluate the fitness values of the individu-
als without the computationally expensive numerical integrations of differential equa-
tions. This method also provides the opportunity for a parallel implementation of the
identification task since a tightly coupled system of non-linear differential equations
can be separated. Hence, they are able to reduce the time complexity drastically.
While collocation method [14] can save the computational cost by approximating
dynamic profiles, their estimated systems tend to be invalid since the number of
measured data is usually insufficient. This lack of data problem can be resolved by
sampling new points from the fitted curves. For the well-estimated profiles, however,
we should determine the optimal topology of the artificial neural network such as the
number of hidden units and layers.
In this paper, we propose multi-stage evolutionary algorithms to identify gene regu-
latory networks efficiently. With the symbolic regression by genetic programming
(GP), we could evade the numerical integration steps. Here, we have no need to pre-
determine the topology of the model for the expression profiles since genetic pro-
gramming can optimized the topology automatically. We also develop hybrid evolu-
tionary algorithms to identify the structure of gene regulatory networks and to esti-
mate the corresponding parameters at the same time. Most previous evolutionary
approaches for the S-system identification have used the structural simplification
procedure in which some parameters whose values are less than a given threshold are
reset to zero. Although this method is able to make the network structure sparse, the
true connections which represent somewhat small effect can be deleted during the
procedures. That is, it is not easy to set the suitable value for the threshold. In our
scheme, Binary matrices for a network structure and real vectors and matrices for
parameter values of S-system are combined into a chromosome and co-evolved to
find the best descriptive model for the given data. Hence we can identify the S-system
without specifying the threshold values for the structural simplification. By applying
the proposed method to the artificial gene expression profiles, we successfully identi-
fied the true structure and estimated the reasonable parameter values with the smaller
number of data than the previous study.
Multi-stage Evolutionary Algorithms 47
2.1 S-System
The S-system [7, 15] is a set of nonlinear differential equations described as follows:
n+m n+m
dX i
= i X j ij i X j ij , i = 1,2,..., n ,
g h
(1)
dt j =1 j =1
instead of the illegible graph and a set of weights. Hence, we can easily obtain deriva-
tions of each time point if the result functions of GPs are differentiable.
At the second stage of our evolutionary algorithm, we use a hybrid evolutionary algo-
rithm for searching the structures and parameters of S-system. The whole procedure
of our algorithm is summarized in Fig. 1.
Fig. 1. The whole proposed multi-stage evolutionary algorithm for the identification of
S-system
In biochemical experiment step, the dynamic profiles of the involved genes can be
obtained by the periodic measurement. We are also able to predict slopes at each time
point from the regression line of the measured data by genetic programming. Then,
we can evaluate and optimize the chromosomes of evolutionary algorithms by com-
paring the estimated slopes which came from the data substitution into the S-system
with the predicted slopes of the GP regression lines. Hence, the fitness values of each
S-system can be evaluated without the time-consuming numerical integrations as
follows:
X (t ) X i(t )
2
n T
E = i
i =1 t =1 X i (t ) , (3)
Multi-stage Evolutionary Algorithms 49
X2 X3
X1
X4
dX 1 dX 2
= 12 X 30 . 8 10 X 10 .5 = 8 X 10 . 5 3 X 20 .75
dt dt
dX 3 dX 4
= 3 X 20 .75 5 X 30 .5 X 40 .2 = 2 X 10.5 6 X 40 .8
dt dt
(b) S-system
0 0 1 0 1 0 0 0
1 0 0 0 0 1 0 0
g = h=
0 1 0 0 0 0 1 1
1 0 0 1
0 0 0 0
binary matrices for the structure
12.0 0.0 0.0 0.8 0.0 10.0 0.5 0.0 0.0 0.0
8.0 0.5 0.0 0.0 0.0 3.0 0.0 0.75 0.0 0.0
= g = = h=
3.0
0.0 0.75 0.0 0.0
5.0 0.0 0.0 0.5 0.2
0.5 0.0 0.0
2.0 0.0 6.0
0.0 0.0
0.0 0.8
where T is the number of sampling points, X is the gradient of the regression line
obtained by the genetic programming and X is the calculated value of each equation
in the S-system.
We develop hybrid evolutionary algorithms to identify the structure of gene regula-
tory networks and to estimate the corresponding parameters at the same time. In this
scheme, Binary matrices for a network structure and real vectors and matrices for
parameter values of S-system are combined into a chromosome (Fig. 2) and co-
evolved to find the best descriptive model for the given data. While crossover opera-
tor is applied to binary matrices for searching the structure of the system, their corre-
sponding parameter values also exchanges. This kind of crossover can inherit the
good structures as well as the parameter values in the parents to the offspring. That is,
we use a row exchange crossover which simply selects the row of the matrix g or h
(or both) on the parents, and swaps each other with the parameter values in the real
vectors and matrices. For example, Fig. 3(a) shows the case in which we select the
50 K.-Y. Kim, D.-Y. Cho, and B.-T. Zhang
0 0 1 0 0 0 0 1
1 0 0 0 0 1 0 0
P1 g = P1 g=
0 1 0 0 1 0 0 0
1 0
0 0 0 0 0 1
0 0 1 0 0 0 0 1
0 1 0 0 1 0 0 0
O1 g = O2 g =
0 1 0 0 1 0 0 0
1 0 0 0 0
0 0 1
(a) crossover (g only)
0 0 1 0 1 0 0 0
1 0 0 0 0 0 0 1
g = g =
1
parent
0 1 0 0
0 0 1
0
1 0 0 0
1 0 0
0 0 1 0 1 0 0 0
1 0 1 0 0 0 0 1
offspring g = g =
0 1 0 0 0 0 1 0
1 0 0 0 0 0
1 0
(b) mutation (insert) (c) mutation (delete)
Fig. 3. Crossover and mutation operators for the binary matrices
We also give some variety to the fitness function in equation (3). In conventional
scheme, all points of data have the same weights on the fitness values. However, it is
difficult to fit the data points which have large second-order differentiation. More-
over, this makes the parameter values of the chromosomes different from the true one
even if they have good fitness values. Thus we multiply the second-order differentia-
tion to each term of evaluation function. The modified fitness function in our algo-
rithm described as follows:
X (t ) X i (t )
2
n T
E = X i (t ) i
i =1 t =1 X i (t ) , (4)
to fitness function, we can obtain better fitness value of true structure than those of
other structures. After the fitness values of the offspring created by the crossover and
mutation according to their probabilities are evaluated by equation (4), parameters in
the real vectors and matrices are adjusted through the (1+1) evolutionary strategy
[17].
We employ the restricted tournament selection (RTS) proposed originally in [18]
to prevent the premature convergence on a local-optimum structure and to find multi-
ple topology candidates. In RTS, a subset of the current population is selected for
each newly created offspring. The size of these subsets is fixed to some constant
called the window size. Then, the new offspring competes with the most similar
member of the subset. Since the window size is set to the population size in our im-
plementation, each offspring is compared with all S-system in the current population.
If the new one is better, it replaces the corresponding individual; otherwise, the new
one is discarded. For the similarity measure, we calculate the structural hamming
distances between the new offspring and all individuals in the population by using the
binary matrices.
3 Experimental Results
3.1 Data
We use Frayns GPLib library [18] for the symbolic regression with GP. To evolve
the mathematical models for the given data, we use the function set F = {+, -, , /, ^,
sin, exp, log, sqrt} and terminal set T = {t, , }, where is real constant and t is the
time point. The population size is 3,000 and the maximum number of generations is
2,000. Tournament selection is used and its size is 4. Crossover rate is 0.35 and muta-
tion rate is 0.5. We set the length penalty of the genetic program as zero for the accu-
rate curve fitting. We generate 100 points and derivations from the obtained models
for the input of the next stage, that is, the hybrid evolutionary algorithms.
52 K.-Y. Kim, D.-Y. Cho, and B.-T. Zhang
f1(t)=(sqrt((((sqrt(((sqrt(2.957153))-((sin(sqrt(t)))+((sin((1.854912)-(((sqrt((3)*(t)))
*(sqrt(t)))+(4.435898))))+(-2.355442))))*(t)))+((40.675830)/(exp((t)*((sqrt(t))-
((sin(((sin((((sqrt((3.756391)*(t)))*(sqrt(t)))+(log(86)))+(7)))+(3))+(sin(sin((1.654737
)*(t))))))+(-2.355442)))))))/(sqrt(54.598150)))/(sqrt(7.931547)))),
f2(t)=(sqrt(((3.777992)-((((((4.190957)-((t)-((sin((((t)*((t)^(t)))-((((t)*((2.883554)
/((4)-(log(t)))))+(2.791190))-((exp((t)-(2.226704)))^(sqrt(9.642561)))))/((t)^(t))))
*(2.678347))))/(3.462360))+(3.792098))-(4.796861))/(4)))+(((((3.792098)-((exp(t))
/(3.462360)))-((t)*((t)^(t)))) /((t)^(t)))/((t)^(t))))),
f3(t)=((log(log(exp(log(log(exp(exp(((log(exp()))-((sin((9)+(((t)+(8.000000))
^(sqrt(1.420245)))))/(((exp(t))*(379.000000))/(84.000000))))-((sin((8)+(((t)
+(((log(109))-(1.258803))/(6.620476)))^(sqrt(log(4.059461))))))
/(((exp(((8.337047)*((log(log(sqrt(3.021610))))+(2.000000)))-(5.912041)))*(exp(t)))
/(85.000000)))))))))))^(5.933873)),
f4(t)=((((log(6.249382))^((sqrt(6))*(((sqrt(10.000000))^((1.172486)-(t)))
/(6.981835))))^((1.161464)-((1.161464)/(((((sqrt(6.249382))*((log(7.008566))
*((((((exp((6.980522)/((sqrt(6.288201))^(1.344547))))^((1.735082)-(t)))
/(0.290257))*(sqrt(6.000000)))^(((9.704760)^((-0.050358)-(t)))-(t)))/(0))))
^((1.634223)+((7.277223)^((0.290257)-(t)))))^(0.161464))/(t)))))/(6.980522)).
Fig. 4 shows the true profiles and estimated curves and sampled data points from the
genetic programming and we confirm that the GP recover the true dynamics of the S-
system
Using 100 points obtained from the GP step, we search the S-system by the hybrid
evolutionary algorithms. This step is composed with steady-state evolutionary algo-
rithms with local optimization procedure - (1+1) evolutionary strategy. For the sparse
network structures, we adopt a structural constraint which the evolved networks
should satisfy. That is, each gene is assumed to be related to one or two other genes in
the system. Hence, the randomly generated initial individuals and offspring by cross-
over and mutation operators should have one or two non-zeros elements in g and h.
Crossover rate is 0.8 and mutation rate is 0.3. As a local optimization for the parame-
ter values, (1+1) evolutionary strategy is performed for 80 fitness evaluations. The
search ranges of the parameters are [0.0, 15.0] for i and i, and [-1.0, 1.0] for gij and
hij. With the population size of 104, the proposed algorithm successfully identified the
true structure after 106 generation. As shown in Fig. 5, we can also recover the dy-
namic profiles with the estimated parameters.
4 Conclusion
We propose multi-stage evolutionary algorithms to identify gene regulatory networks
efficiently with the S-system representation. We adopt the pre-processing symbolic
regression step by genetic programming for avoiding the time-consuming numerical
integration. We also develop hybrid genetic algorithms and modify the fitness func-
tion to identify the structure of gene regulatory networks and to estimate the corre-
sponding parameters simultaneously without the threshold values for the sparse net-
work structures. By applying the proposed method to the identification of an artificial
genetic network, we verify its capability of finding the true S-system.
One important future work is to demonstrate the usefulness of the proposed algo-
rithm for real experimental biological data such as the gene expression profiles from
the microarrays and NMR measurements of some metabolites. As the by-product of
the population diversity maintenance of our evolutionary algorithms, we will be able
to attain the different plausible topologies for the network very efficiently. These can
Multi-stage Evolutionary Algorithms 55
Acknowledgement
This work was supported by the Korea Ministry of Science and Technology through
National Research Lab (NRL) project and the Ministry of Education and Human
Resources Development under the BK21-IT Program. The ICT at the Seoul National
University provided research facilities for this study.
References
1. Covert, M.W., Schilling, C.H., Famili, I. Edwards, J.S., Goryanin, I.I. Selkov, E., Palsson,
B.O.: Metabolic modeling of microbial strains in silico. Trends in Biochemical Science 26
(2001) 179-186
2. De Jong, H.: Modeling and simulation of genetic regulatory system: a literature review.
Journal of Computational Biology 9 (2002) 67-103
3. Stelling, J.: Mathematical models in microbial systems biology. Current Opinion in Mi-
crobiology 7 (2004) 513-518
4. Barabasi, A.-L., Oltvai, Z.N.: Network biology: Understanding the cell's functional or-
ganization. Nature Reviews Genetics 5 (2004) 101-113
5. De Jong, H., Gouze, J.-L., Hernandez, C., Page, M., Sari, T., Geiselmann, J.: Qualitative
simulation of genetic regulatory networks using piecewise-linear models. Bulletin of
Mathematical Biology 66 (2004) 301-340
6. Rao, C.V., Wolf, D.M., Arkin, A.P.: Control, exploitation and tolerance of intracellular
noise. Nature 402 (2002) 231-237
7. Voit, E.O.: Computational Analysis of Biochemical Systems. Cambridge University Press
(2000)
8. Fogel, D.B.: System Identification Through Simulated Evolution: A Machine Learning Ap-
proach to Modeling. Ginn Press (1991)
9. Tominaga, D., Koga, N., Okamoto, M.: Efficient numerical optimization algorithm based
on genetic algorithm for inverse problem. In: Whitley, D. et al. (Eds.), Proceedings of the
2000 Genetic and Evolutionary Computation Conference (2000) 251-258
10. Ando, S., Sakamoto, E., Iba, H.: Evolutionary modeling and inference of gene network.
Information Science 145 (2002) 237-259
11. Kikuchi, S., Tominaga, D., Arita, M., Takahashi, K., Tomita, M.: Dynamic modeling of
genetic networks using genetic algorithm and S-system. Bioinformatics 19 (2003) 643-650
12. Spieth, C., Streichert, F., Speer, N., Zell, A.: Optimizing topology and parameters of gene
regulatory networt models from time-series experiments. In: Deb, K. et al. (eds.): Proceed-
ings of the 2004 Genetic and Evolutionary Computation Conference. Lecture Notes in
Computer Science, Vol. 3102. Springer-Verlag (2004) 461-470
13. Almeida J.S., Voit E.O.: Neural network-based parameter estimation in S-system models
of biological networks, Genome Informatics 14 (2003) 114-123
14. Tsai, K.-Y., Wang, F.-S.: Evolutionary optimization with data collocation for reverse engi-
neering of biological networks. Bioinformatics 21 (2005) 1180-1188
15. Savageau, M.A.: Biochmical System Analysis: A Study of Function and Design in Molecu-
lar Biology, Addison-Wesley (1976)
56 K.-Y. Kim, D.-Y. Cho, and B.-T. Zhang
16. Koza, J.R.: Genetic Programming: On the Programming of Computers by Natural Selec-
tion. MIT Press (1992)
17. Bck, T.: Evolutionary Algorithm in Theory and Practice, Oxford University Press (1996)
18. http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html
19. Harik, G.R.: Finding multimodal solutions using restricted tournament selection. In:
Eshelman, L.J. (ed.): Proceedings of the Sixth International Conference on Genetic Algo-
rithms (1995) 24-31
Human Papillomavirus Risk Type Classification
from Protein Sequences Using
Support Vector Machines
Biointelligence Laboratory,
School of Computer Science and Engineering,
Seoul National University, Seoul 151-744, South Korea
{skim, btzhang}@bi.snu.ac.kr
1 Introduction
The cervical cancer is a leading cause of cancer death among women worldwide.
Epidemiologic studies have shown that the association of genital human papillo-
mavirus (HPV) with cervical cancer is strong, independent of other risk factors
[1]. HPV infection causes virtually all cases of cervical cancer because certain
high-risk HPVs develop cancer even though most cases of HPV are low-risk and
rarely develop into cancer. Especially, high-risk HPV types could induce more
than 95% of cervical cancer in woman.
The HPV is a relatively small, double-strand DNA tumor virus that belongs
to the papovavirus family (papilloma, polyoma, and simian vacuolating viruses).
More than 100 human types are specic for epithelial cells including skin, respira-
tory mucosa, or the genital tract. And the genital tract HPV types are classied
into two or three types such as low-, intermediate-, and high-risk types by their
relative malignant potential [2]. The common, unifying oncogenic feature of the
vast majority of cervical cancers is the presence of HPV, especially high-risk type
HPV [3]. Thus the risk type detection of HPVs have become one of the most
essential procedures in cervical cancer remedy. Currently, the HPV risk types
are still manually classied by experts, and there is no deterministic method to
expect the risk type for unknown or new HPVs.
Since the HPV classication is important in medical judgments, there have
been many epidemiological and experimental studies to identify HPV risk types
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 5766, 2006.
c Springer-Verlag Berlin Heidelberg 2006
58 S. Kim and B.-T. Zhang
[3]. Polymerase chain reaction (PCR) is a sensitive technique for the detection of
very small amounts of HPVs nucleic acids in clinical specimens. It has been used
in most epidemiological studies that have evaluated the role of these viruses in
cervical cancer causation [4]. Bosch et al. [1] investigated epidemiological charac-
teristic that whether the association between HPV infection and cervical cancer
is consistent worldwide in the context of geographic variation. Burk et al. [5] in-
spected the risk factors for HPV infection in 604 young college women through
examining social relationship and detected various factors of HPV infection with
L1 consensus primer PCR and Southern blot hybridization. Munoz et al. [6] clas-
sied the HPV risk types with epidemiological experiments based on risk factor
analysis. They pooled real data from 1,918 cervical cancer patients and analyzed
it by PCR based assays.
Detection of HPV risk types can be a protein function prediction even though
functions are described at many levels, ranging from biochemical function to bio-
logical processes and pathways, all the way up to the organ or organism level [7].
Many approaches for protein function prediction are based on similarity search
between proteins with known function. The similarity among proteins can be
dened in a multitude of ways [8]: sequence alignment, structure match by com-
mon surface clefts or binding sites, common chemical features, or certain motifs
comparison. However, none of the existing prediction systems can guarantee gen-
erally good performance. Thus it is required to develop classication methods
for HPV risk types. Eom et al. [9] presented a sequence comparison method for
HPV classication. They use DNA sequences to discriminate risk types based
on genetic algorithms. Joung et al. [10] combined with several methods for the
risk type prediction from protein sequences. Protein sequences are rst aligned,
and the subsequences in high-risk HPVs against low-risk HPVs are selected by
hidden Markov models. Then a support vector machine is used to determine the
risk types. The main drawback of this paper is that the method is biased by
one sequence pattern. Alternatively, biomedical literature can be used to predict
HPV risk types [11]. But, text mining approaches have the limitation for predic-
tion capability because they only depend on texts to capture the classication
evidence, and the obvious keywords such as high tend to be appeared in the
literature explicitly.
In this paper, we propose a method to classify HPV risk types using pro-
tein sequences. Our approach is based on support vector machines (SVM) to
discriminate low- and high-risk types and a string kernel is introduced to deal
with protein sequences. The string kernel rst maps to the space consisting of
all subsequences of amino acids pair. A RBF kernel is then used for nonlinear
mapping into a higher dimensional space and similarity calculation. Especially,
the proposed kernel only uses amino acids of both ends in k-length subsequences
to improve the classication performance. It is motivated by the assumption
that amino acids pairs with certain distance aects the HPVs biological func-
tion, i.e. risk type, more than consecutive amino acids. The experimental results
show that our approach provides better performance than previous approaches
in accuracy and F1-score.
Human Papillomavirus Risk Type Classication from Protein Sequences 59
Our work addresses how to classify HPV risk types from protein sequences
by SVM approaches, which can provide a guide to determine unknown or new
HPVs. The paper is organized as follows. In Section 2, we explain the SVM
method for classication. Then the string kernel for HPV protein sequence is
presented in Section 3. In Section 4, we present the experimental results and
draw conclusions in Section 5.
We use support vector machines to classify HPV risk types. A string kernel-based
SVM is trained on HPV protein sequences and tested on unknown sequences.
Support vector machines have been developed by Vapnik to give robust perfor-
mance for classication and regression problems in noisy, complex data [12]. It
has been widely used from text categorization to bioinformatics in recent days.
When it is used for classication problem, a kernel and a set of labeled vectors,
which is marked to positive or negative class are given. The kernel functions in-
troduce nonlinear features in hypothesis space without explicitly requiring non-
linear algorithms. SVMs learn a linear decision boundary in the feature space
mapped by the kernel in order to separate the data into two classes.
For a feature mapping , the training data S = {xi , yi }ni=1 , is mapped into
the feature space (S) = {(xi ), yi }ni=1 . In the feature space, SVMs learn the
hyperplane f = w, (x) + b, w RN , b R, and the decision is made by
sgn(w, (x) + b). The decision boundary is the hyperplane f = 0 and its
margin is 1/w. SVMs nd a hyperplane that has the maximal margin from
each class among normalized hyperplanes.
To nd the optimal hyperplane, it can be formulated as the following problem:
1
minimize w2 (1)
2
subject to yi (w, (xi ) + b) 1, i = 1, . . . , n. (2)
n
n n
1
maximize i i j yi yj (xi ), (xj ) (3)
i=1
2 i=1 j=1
subject to i 0, i = 1, . . . , n, (4)
n
i yi = 0. (5)
i=1
We can work on the feature space by using kernel functions, and any kernel
function K satisfying Mercers condition can be used.
3 Kernel Function
For HPV protein classication, we introduce a string kernel based on the spec-
trum kernel method. The spectrum kernel was used to detect remote homology
detection [13][14]. The input space X consists of all nite length sequences of
characters from an alphabet A of size |A| = l (l = 20 for amino acids). Given
a number k 1, the k-spectrum of a protein sequence is the set of all possible
k-length subsequences (k-mers) that it contains. The feature map is indexed by
all possible subsequences a of length k from A. The k-spectrum feature map
k
k (x) from X to Rl can be dened as:
where > 0. This string kernel is used in combination with the SVM explained
in Section 2.
4 Experimental Results
4.1 Data Set
In this paper, we use the HPV sequence database in Los Alamos National Lab-
oratory (LANL) [16], and total 72 types of HPV are used for experiments. The
risk types of HPVs were determined based on the HPV compendium (1997). If
a HPV belongs to skin-related or cutaneous groups, the HPV is classied into
low-risk type. On the other hand, a HPV is classied as a high-risk if it is known
to be high-risk type for cervical cancer. The comments in LANL database are
used to decide risk types for some HPVs, which are dicult to be classied.
Seventeen sequences out of 72 HPVs were classied as high-risk types (16, 18,
62 S. Kim and B.-T. Zhang
1
E6
E7
L1
0.9
Accuracy
0.8
0.7
1 2 3 4
Window size
31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 61, 66, 67, 68, and 72), and others were
classied as low-risk types. Table 1 shows the HPV types and their classied
risk type. The symbol ? in the table denotes unknown risk type that cannot be
determined.
Since several proteins can be applied to discriminate HPVs, we have evalu-
ated the classication accuracy using the SVM with RBF kernel to determine the
gene products to be used for the experiments. The input data is the normalized
frequency vector by sliding window method. It has been performed to decide the
most informative protein among gene products for HPV risk type classication.
Figure 1 depicts the accuracy changes by window size for E6, E7, and L1 pro-
teins. The accuracy is the result of leave-one-out cross-validation. It indicates
that the accuracy using E6 protein is mostly higher than using E7 and L1 pro-
teins. However, the overall accuracy gets high by increasing window size for all
proteins because the HPV sequences are relatively short and unique patterns are
more generated when window size is long. That is, the learners overt protein
sequences for long window size. Viral early proteins E6 and E7 are known for
inducing immortalization and transformation in rodent and human cell types.
E6 proteins produced by the high-risk HPV types can bind to and inactivate
the tumor suppressor protein, thus facilitating tumor progression [16][17]. This
process plays an important role in the development of cervical cancer. For these
reasons, we have chosen E6 protein sequences corresponding to the 72 HPVs.
When binary scales are used for both answer and prediction, a contingency
table is established showing how the data set is divided by these two measures
(Table 2). By the table, the classication performance is measured as follows:
a
precision = 100%
a+b
a
recall = 100%
a+c
2 precision recall
F 1 score = .
precision + recall
0.98
0.97
0.96
0.95
Accuracy 0.94
0.93
0.92
0.91
0.9
1 2 3 4 5 6 7
k
100 100
90
90
Accuracy
F1-score
80
80
70
70 60
Gap-spectrum SVM SVM AdaCost Naive Bayes Gap-spectrum SVM SVM AdaCost Naive Bayes
Text mining approaches only depend on the clues from text sentences. If the
text documents are unavailable for unknown HPVs, there is no way to clas-
sify them, whereas the sequence-based classication does not need to use any
additional information except sequence itself.
Table 3 shows the risk type prediction for HPVs marked as unknown in Ta-
ble 1. HPV26, HPV54, HPV57, and HPV70 are predicted as high-, low-, low-,
and high-risk, respectively. The prediction results for HPV26 and HPV54 are
identical to the one in Munoz et al. [6], and we assume that their results are cor-
rect because it is based on epidemiologic classication from over 1,900 patients.
For HPV70, there are dierent decisions for the risk type according to previous
research [6][18][19], and the risk type of HPV57 cannot be decided yet because
of insucient previous works. By the prediction results, we can conclude our
approach provides certain probability for whether unknown HPVs are high-risk
or not.
Human Papillomavirus Risk Type Classication from Protein Sequences 65
5 Conclusion
Acknowledgement
This work was supported by the Korea Ministry of Science and Technology
through National Research Lab (NRL) project and the Ministry of Education
and Human Resources Development under the BK21-IT Program. The ICT at
the Seoul National University provided research facilities for this study.
References
[1] Bosch, F. X., Manos, M. M., et al.: Prevalence of Human Papillomavirus in Cer-
vical Cancer: a Worldwide Perspective. Journal of the National Cancer Institute
87 (1995) 796802.
[2] Janicek, M. F. and Averette, H. E.: Cervical Cancer: Prevention, Diagnosis, and
Therapeutics. Cancer Journals for Clinicians 51 (2001) 92114.
[3] Furumoto, H. and Irahara, M.: Human Papillomavirus (HPV) and Cervical Can-
cer. Journal of Medical Investigation 49 (2002) 124133.
66 S. Kim and B.-T. Zhang
[4] Centurioni, M. G., Puppo, A., et al.: Prevalence of Human Papillomavirus Cervical
Infection in an Italian Asymptomatic Population. BMC Infectious Diseases 5(77)
(2005).
[5] Burk, R. D., Ho, G. Y., et al.: Sexual Behavior and Partner Characteristics Are the
Predominant Risk Factors for Genital Human Papillomavirus Infection in Young
Women. The Journal of Infectious Diseases 174 (1996) 679689.
[6] Munoz, N., Bosch, F.X., et al.: Epidemiologic Classication of Human Papil-
lomavirus Types Associated with Cervical Cancer. New England Journal of
Medicine 348 (2003) 518527.
[7] Watson, J. D., Laskowski, R. A., and Thornton, J. M.: Predicting Protein Function
from Sequence and Structural Data. Current Opinion in Structural Biology 15
(2005) 275284.
[8] Borgwardt, K. M., Ong, C. S., et al.: Protein Function Prediction via Graph Ker-
nels. In Proceedings of Thirteenth International Conference on Intelligenc Systems
for Molecular Biology (2005) 4756.
[9] Eom, J.-H., Park, S.-B., and Zhang, B.-T.: Genetic Mining of DNA Sequence
Structures for Eective Classication of the Risk Types of Human Papillomavirus
(HPV). In Proceedings of the 11th International Conference on Neural Information
Processing (2004) 13341343.
[10] Joung, J.-G., O, S.-J, and Zhang, B.-T.: Prediction of the Risk Types of Human
Papillomaviruses by Support Vector Machines. In Proceedings of the 8th Pacific
Rim International Conference on Artificial Intelligence (2004) 723731.
[11] Park, S.-B., Hwang, S., and Zhang, B.-T.: Mining the Risk Types of Human Papil-
lomavirus (HPV) by AdaCost. In Proceedings of the 14th International Conference
on Database and Expert Systems Applications (2003) 403412.
[12] Vapnik, V. N.: Statistical Learning Theory. Springer (1998).
[13] Leslie, C., Eskin, E., and Noble, W. S.: The Spectrum Kernel: A String Ker-
nel for SVM Protein Classication. In Proceedings of the Pacific Symposium on
Biocomputing (2002) 564575.
[14] Leslie, C., Eskin, E., et al.: Mismatch String Kernels for Discriminative Protein
Classication. Bioinformatics 20(4) (2004) 467476.
[15] Shawe-Taylor, J. and Cristianini, N.: Kernel Methods for Pattern Analysis. Cam-
bridge University Press (2004).
[16] The HPV sequence database in Los Alamos laboratory, http://hpv-web.lanl.gov/
stdgen/virus/hpv/index.html.
[17] Pillai, M., Lakshmi, S., et al.: High-Risk Human Papillomavirus Infection and
E6 Protein Expression in Lesions of the Uterine Cervix. Pathobiology 66 (1998)
240246.
[18] Longuet, M., Beaudenon, S., and Orth, G.: Two Novel Genital Human Papillo-
mavirus (HPV) Types, HPV68 and HPV70, Related to the Potentially Oncogenic
HPV39. Journal of Clinical Microbiology 34 (1996) 738744.
[19] Meyer, T., Arndt, R., et al.: Association of Rare Human Papillomavirus Types
with Genital Premalignant and Malignant Lesions. The Journal of Infectious Dis-
eases 178 (1998) 252255.
Hierarchical Clustering, Languages and Cancer
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 6778, 2006.
c Springer-Verlag Berlin Heidelberg 2006
68 P. Mahata et al.
In addition to the clustering solutions available in the literature for the datasets
considered, we have used two unsupervised techniques for computing alternative
solutions. The rst one is based on arithmetic-harmonic cuts, and the second
one relies on the utilization of ultrametric trees. These will be described below.
memetic algorithm uses (a) a dierential greedy algorithm (similar to that in [3])
for initialization of a set of solutions for the problem, (b) a dierential greedy
crossover (a modication of the algorithm in [2]) for evolution of the population,
and (c) a variable neighborhood local search (see [4]) to improve the newly gen-
erated solutions. Whenever the population stagnates, we keep the best solution
and re-initialize the rest of solutions in the set. We use this memetic algorithm
if the graph contains more than 25 vertices, and a backtracking enumeration
algorithm otherwise. Notice that even though backtracking gives us an optimal
solution, a memetic algorithm may not. However, in the considered datasets, the
memetic algorithm consistently generated the same solution in all runs (thus it
is presumably optimal). By applying this method (backtracking or memetic al-
gorithm depending on the number of vertices) recursively, we have at each step a
graph as input, and the two subgraphs induced by each of the sets of the vertex
partition as output; stopping when we arrive to a graph with just one vertex,
we generate a hierarchical clustering in a top-down fashion.
The rationale of the use of our objective function can be clear if we rearrange
its terms. We can write
Aout
F = (|E| |Eout |)|Eout | (2)
Hin
where Aout is the arithmetic mean of the weights that connect vertices of S
with V \ S (the cut); Hin is the harmonic mean of the weights of the edges not
in the cut, and |Eout | is the cardinality of the cut. Informally, maximizing F is
equivalent to try to nd a cut that discriminates well the two groups, normalized
by the harmonic mean of the intra-cluster dissimilarity, and multiplied by a
factor that is maximum when the two groups have a similar number of elements.
Normalizing by the harmonic mean allows the denominator being more stable
to the presence of outlier samples when associated to either V or V \ S. For this
reason, we denote this partition as arithmetic-harmonic cut.
Notice that maximizing the rst part of the objective function, i.e.,
eEout w(e) (the total weights of edges across the two sets) is the same as
solving the Max-Cut problem for graph G, which is a N P -hard problem. How-
ever, it turns out that the hierarchy generated by partitions using Max-Cut
does not corroborate the previous knowledge about the datasets. This is prob-
ably due to the fact that no importance is given in Max-Cut to the similarity
of vertices within the sets. We also considered the objective function
F = w(e) w(e) (3)
eEout eEin
distance Dij between any two leaves i and j (measured as the sum of the weights
of the edges that have to be traversed to reach i from j inside T ) veries that
Dij max{Dik , Djk }, 1 i, j, k n, where n is the number of leaves. This
equation implies that given any internal node h in T , it holds that Dhi = Dhj
for any leaves i, j having h as ancestor.
The use of ultrametric trees has several advantages in hierarchical classica-
tion. First of all, edge weights are very easy to compute: given a distance matrix
M containing dissimilarity values for a collection of objects, and a candidate
tree T , the minimum weights such that Dij Mij and T is ultrametric can be
computed in O(n2 ) [5]. Secondly, they adapt very well to dynamical processes
evolving at a more or less constant rate. Finally, even if the latter is not the case,
they provide a very good approximation to more relaxed criteria such as mere
additivity, that would be much more computationally expensive to calculate.
Notice also that nding the optimal topology T for a given distance matrix M
under the ultrametric assumption is NP-hard [5].
Ultrametric trees have been computed using an evolutionary two-phase pro-
cedure: rstly, a collection of high quality tentative trees are generated; subse-
quently, a consensus method is used to summarize this collection into a single
tree. Beginning with the former, the generation of high quality (i.e., minimum
weight) ultrametric trees has been approached using an evolutionary algorithm
based on the scatter search template. Starting from the solution provided by the
complete-link agglomerative algorithm, an initial population of trees is produced
by perturbation (internal exchanges of branches). Then, an evolutionary cycle
is performed using tree-based path relinking for recombination [6], and internal
rotations for local search (no mutation is used). Whenever the system stagnates,
the population is restarted by keeping the best solution and generating new trees
by exchanging branches among existing trees.
Once a collection of high quality trees has been found, the consensus method
is used to amalgamate them. This is done using the TreeRank measure [7] as
similarity metric among trees. This measure is based on counting the number of
times we have to traverse an edge upwards or downwards in order to go from a
certain leaf to another one. By computing how dierent these gures are for two
trees, we obtain a dissimilarity value. The TreeRank measure is currently being
used in TreeBASE1 one of the most widely used phylogenetic databases for
the purposes of handling queries for similar trees.
The consensus algorithm we have used is an evolutionary metaheuristic that
evolves tentative trees following [8]. Given the collection of trees we want to
summarize, the sum of dissimilarities to the tentative tree is used as the tness
function (to be minimized). Evolution is performed using the prune-delete-graft
operator [9, 10] for recombination, no mutation, binary tournament selection,
and elitist replacement. In our experiments, we have considered all dierent
trees generated by the scatter search method in one hundred runs, and then
running the consensus algorithm one hundred times on this collection. The best
solution out of these latter 100 runs is kept as the nal consensus tree.
1
http://www.treebase.org
Hierarchical Clustering, Languages and Cancer 71
Fig. 1. Three proposed language-Trees: (a) tree using arithmetic-harmonic cuts, (b)
Gray-Atkinsons tree [16], (c) consensus ultrametric tree
the Middle East between 10,000 and 6,000 years ago, which also correlates well
with the rst principal component of the genetic variation of 95 polymorphisms
[19], which solely accounts for 28 % of the total variation.
Hierarchical Clustering, Languages and Cancer 73
In addition to this fact, there are certain strange relations between languages
at the leaves of Grays tree. After checking the distance matrix, we nd several
cases in which our tree seems to produce a more reasonable branching than Gray
and Atkinsons. First of all, the closest neighbor of Czech and Slovak languages
are Lusatian languages. It is probably natural to have Czech, CzechE and Slo-
vak placed in a subtree closer to Lusatian languages. In the trees generated from
both arithmetic-harmonic cut (Fig. 1(a)) and the ultrametric trees (Fig. 1(c)),
we see that these languages are placed next to each other. However in Fig. 1(b)
generated by [16], Czech and Slovak are placed closer to Ukrainian, Byelorussian
and Russian. Secondly, Catalan is a language evolved from Latin, with strong
inuences from French and Spanish. As a consequence of its Latin origin, Italian
is the closest language of Catalan in the dataset. The position of Catalan with
Italian and Ladin in our tree seems very natural, as hybridizations with French
and Spanish occurred later (note that the bayesian posterior probability is 0.83
for its link with the Spanish-Portuguese group). See [16] for the details of proba-
bilities in the Figure 1(b). Although Italians closest language is Ladin, the latter
was placed closer to RomanianList and Vlach with the posterior probability of
0.88. Also, notice the position of the Italian with 0.59 posterior probability. Fi-
nally, there are also small dierences in the topology of small subtrees between
our hierarchy and Grays, namely, those regarding Dutchlist-Afrikaans-Flemish,
Greek languages, Albanian languages and the position of Bengali in the Aryan
languages among others. The dierences seem to occur mainly where the poste-
rior probability of one or several branchings is low.
An important dierence is that in our classication the Celtic languages are
considered closer to Baltic-Slavic languages. This goes against the current be-
lief of Celtics closeness to Romanic and Germanic languages. Note that in
Gray and Atkinsons classication, the branchings of (Germanic,Romance) and
(Celtic,(Germanic,Romance)) have low posterior probabilities (0.46 and 0.67, re-
spectively). The minimum-weight ultrametric tree (see Fig. 1(c)) for this dataset
also considers Celtic and Slavic languages to be the closest ones as groups. How-
ever, this tree disagrees with our tree in the primary branches. For instance, it
rst takes out Indo-Afghan languages as outliers, then considers Albanian and
Greco-Armenian languages as outliers successively. In the tree obtained by the
arithmetic-harmonic cut, all these outliers are grouped together. Notice that
even at the successive branchings, the consensus ultrametric tree often produces
a large number of outliers (see e.g., Indic and Iranian branches of Figure 1(c)).
NODE 6
SKMEL28
UACC257 MELANOMA
MALME3M
M14
SKMEL2A
MDAMB435
BREAST
MDAN
SKMEL5
MELANOMA
UACC62
NODE 2
MDAMB231 BREAST
HOP92 NONSMALLLUNG
SN12C RENAL
HOP62 NONSMALLLUNG
U251
(B)
SNB19 CNS
SF295
HS578T BREAST
SNB75 RENAL
SF539 NODE 3 NODE 4
CNS
SF268
BT549 BREAST
NODE 5
NCIH226 NONSMALLLUNG
SKOV3
OVARIAN
OVCAR8
ADRRES
EKVX NONSMALLLUNG
A498
RXF393
7860
TK10 RENAL
ACHN
UO31
CAKI1
MCF7
MCF7
BREAST
MCF7 (C)
T47D
NCIH23
NONSMALLLUNG
NODE 4
NCIH522
K562 NODE 5 NODE 6
K562
K562
RPMI8226 LEUKAEMIA
MOLT4
CCRFCEM
HL60
SR
NODE 1
OVCAR3
OVCAR4 OVARIAN
IGROV1
DU145
PROSTATE
PC3
HCT15
NODE 3
SW620
HCC2998
COLON
COLO205
KM12 (D)
HT29
NCIH322 NONSMALLLUNG
OVCAR5 OVARIAN
HCT116 COLON
(A)
The analysis of this dataset was done by Ross et al. in 2000 [20], where a result
of a hierarchical clustering for this dataset was rst discussed. Their result shows
that the cell lines from same origin were grouped together in case of leukaemia,
melanoma, colon, renal and ovarian cancers, with a few exceptions. However, cell-
lines derived from non-small lung carcinoma and breast tumors were distributed
in multiple places suggesting a heterogeneous nature.
Fig. 2(a) shows the result of applying arithmetic-harmonic cut on this dataset.
In Fig. 2(b),(c) and (d), we show the genetic signatures (most dierentially
expressed genes in the two sides of the partition) of the rst three partitions
using the cut with 1,101, 696 and 695 genes respectively. In the genetic signatures
(computed with the method described in [21] and [22]), each row corresponds to
a gene and each column corresponds to a tumor sample.
Hierarchical Clustering, Languages and Cancer 75
U251
HOP62 NONSMALLLUNG
SNB19 CNS
LOXIMVI MELANOMA
SF295
ADRRES
SNB75 RENAL
OVCAR8 OVARIAN
HS578T BREAST
SN12C
SF539
UO31 CNS
SF268
ACHN
BT549 BREAST
CAKI1
RENAL HOP62
RXF393 NONSMALLLUNG
NCIH226
7860
A498
TK10
RXF393
NCI226
NONSMALLLUNG 7860
HOP92
CAKI1 RENAL
MDAMB231 BREAST
ACHN
SF295 CNS
UO31
SNB75 RENAL
TK10
SNB19
MDAMB231 BREAST
U251 CNS
HOP92 NONSMALLLUNG
SF539
SN12C RENAL
HS578T BREAST
ADRRES
SF268 CNS
OVCAR8 OVARIAN
BT549 BREAST
LOXIMVI MELANOMA
EKVX
NONSMALLLUNG PC3 PROSTRATE
A549
OVCAR3
A498 RENAL
OVCAR4
NCIH460 NONSMALLLUNG
IGROV1 OVARIAN
OVCAR5 OVARIAN
SKOV3
NCIH322 NONSMALLLUNG
OVCAR5
DU145
PROSTATE DU145 PROSTRATE
PC3
EKVX
SKOV3
A549 NONSMALLLUNG
IGROV1
OVARIAN NCIH460
OVCAR3
RPMI8226
OVCAR4
K562
K562
K562
K562
K562
K562 LEUKAEMIA
HL60
CCRFCEM
LEUKAEMIA CCRFCEM
MOLT4
MOLT4
HL60
SR
SR
HCT116
RPMI8226
SW620
HCC2998
HCT15
COLO205
KM12 COLON
KM12
COLON HCC2998
HT29
COLO205
SW620
HT29
HCT15
MCF7
HCT116
MCF7
NCIH23 BREAST
NONSMALLLUNG MCF7
NCIH522
T47D
MCF7
NCIH322
MCF7 NONSMALLLUNG
BREAST NCIH23
MCF7
NCIH522
T47D
SKMEL5 MELANOMA
SKMEL28
MDAMB435
UACC257 BREAST
MDAN
MALME3M
MELANOMA M14
M14
SKMEL28
SKMEL2A
UACC257
UACC62 MELANOMA
MALME3M
MDAMB435
BREAST UACC62
MDAN
MELANOMA SKMEL2A
SKMEL5
(A) (B)
Fig. 3. Classication of NCI60 Dataset from (a) Ross et al. and (b) ultrametric tree
5 Conclusions
We proposed two new approaches for hierarchical clustering and showed the
result of applying these methods on two very diverse datasets. The hierarchies
we produce for both languages and cancer samples in this method agree very well
with existing data about these datasets. It also raises some interesting questions.
The arithmetic-harmonic cut seems to correlate well with the results of the
rst component of the genetic variation provided by Cavalli-Sforza and his co-
authors [19]. It indicates a branching in two major groups, with an earlier group
moving towards Europe (actually, the advanced farming hypothesis at work),
Hierarchical Clustering, Languages and Cancer 77
later followed by another group moving in the same direction (evolving in Greek
and Albanian and Armenian languages) while another group moved south-
east and later dierentiated in Iranian and Indic languages. It also suggest a
commonality of Celtic, Baltic and Slavic (a hypothesis also raised in the past,
and also supported by the consensus of the ultrametric trees). These dierences,
as well as small others, with the solution provided by Gray and Atkinsons
seem to be in branchings where the bayesian posterior probability is low, and
our methods agree where the posterior probability is high. The consensus of
the ultrametric trees seem to suggest a single wave towards Europe, but a rst
branching in an Albanian group, followed by a second branching with the Greek
and Armenian in one subgroup seems less plausible to us.
Overall, our results seem to indicate that it is important to use several hier-
archical clustering algorithms and to analyze common subgroupings. In the case
of tumor samples, it is indeed the case that this is the most relevant outcome as
we do not have any guarantee that the samples share a common ancestor.
The question: Which tree is the best one ? might actually be highly irrele-
vant to the the real problem at hand, as it seems to be the consensus of these
trees the most important outcome. Results on a number of other clustering al-
gorithms on these datasets (which we were unable to show here for reasons of
space), indicates that more research in robust algorithm methods needs to be
done for molecular subtype classication in cancer and that validation of the
methodology with dierent problem settings is highly benecial to develop it.
References
1. Cotta, C., Moscato, P.: A memetic-aided approach to hierarchical clustering from
distance matrices: Application to phylogeny and gene expression clustering. Biosys-
tems 71 (2003) 7597
2. Merz, P., Freisleben, B.: Fitness landscapes, memetic algorithms, and greedy op-
erators for graph bipartitioning. Evolutionary Computation 8 (2000) 6191
3. Battiti, R., Bertossi, A.: Dierential greedy for the 0-1 equicut problem. In: Proc.
of DIMACS Workshop on Network Design: Connectivity and Facilities. (1997)
4. Festa, P., Pardalos, P., Resende, M.G.C., Ribeiro, C.C.: Randomized heuristics for
the MAX-CUT problem. Optimization Methods and Software 7 (2002) 10331058
5. Wu, B., Chao, K.M., Tang, C.: Approximation and exact algorithms for construct-
ing minimum ultrametric trees from distance matrices. Journal of Combinatorial
Optimization 3 (1999) 199211
6. Cotta, C.: Scatter search with path relinking for phylogenetic inference. European
Journal of Operational Research 169 (2006) 520532
7. Wang, J., Shan, H., Shasha, D., Piel, W.: Treerank: A similarity measure for nearest
neighbor searching in phylogenetic databases. In: Proceedings of the 15th Interna-
tional Conference on Scientic and Statistical Database Management, Cambridge
MA, IEEE Press (2003) 171180
8. Cotta, C.: On the application of evolutionary algorithms to the consensus tree
problem. In Gottlieb, J., Raidl, G., eds.: Evolutionary Computation in Combina-
torial Optimization. Volume 3248 of Lecture Notes in Computer Science., Berlin,
Springer-Verlag (2005) 5867
78 P. Mahata et al.
9. Moilanen, A.: Searching for the most parsimonious trees with simulated evolution.
Cladistics 15 (1999) 3950
10. Cotta, C., Moscato, P.: Inferring phylogenetic trees using evolutionary algorithms.
In Merelo, J., et al., eds.: Parallel Problem Solving From Nature VII. Volume 2439
of Lecture Notes in Computer Science. Springer-Verlag, Berlin (2002) 720729
11. Mallory, J.P.: Search of the Indo-European languages. Archaelogy and Myth (1989)
12. Renfrew, C.: Time-depth in historical linguistics. The McDonald Institute for
Archaeological Research (2000) 413439
13. Richards, M.: Tracing european founder lineage in the near easter mtDNA pool.
Am. K. Hum. Genet 67 (2000) 12511276
14. Semoni: The genetic legacy of Paleolithic Homo Sapiens in extant europeans: a Y
chromosome perspective. Science 290 (2000) 11551159
15. Chikhi, L., Nichols, R., Barbujani, G., Beaumont, M.: Y genetic data support the
Neolithic demic diusion model. Prod. Natl. Acad., Sci. 67 (2002) 1100811013
16. Gray, R.D., Atkinson, Q.D.: Language-tree divergence times support the anatolian
theory of indo-european origin. Nature 426 (2003) 435439
17. Bryant, D., Filimon, F., Gray, R.: Untangling our past: Languages, trees, splits and
networks. In Mace, R., Holden, C., Shennan, S., eds.: The Evolution of Cultural
Diversity: Phylogenetic Approaches. UCL Press (2005) 6985
18. Dyen, I., Kruskal, J.B., Black, P.: An Indo-European classication: A lexicosta-
tistical experiment. Transactions of the American Philosophical Society, New Ser.
82 (1992) 1132
19. Cavalli-Sforza, L.: Genes, peoples, and languages. Proceedings of the National
Academy of Sciences of the United States of America 94 (1997) 77197724
20. Ross, D.T., Scherf, U., Eisen, M., Perou, C., Rees, C., Spellman, P., Iyer, V., Jerey,
S., Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J.C., Lashkari, D., Shalon,
D., Myers, T., Weinstein, J.N., Botstein, D., Brown, P.: Systematic variation in
gene expression patterns in human cancer cell lines. Nature Genetics 24 (2000)
227235
21. Cotta, C., Langston, M., Moscato, P.: Combinatorial and algorithmic issues for
microarray data analysis. In: Handbook of Approximation Algorithms and Meta-
heuristics. Chapman and Hall (2005)
22. Hourani, M., Mendes, A., Berretta, R., Moscato, P.: A genetic signature for parkin-
sons disease using rodent brain gene expression. In Keith, J., ed.: Bioinformatics.
Humana Press (2006)
23. Ferraresi, V., Ciccarese, M., Zeuli, M., Cognetti, F.: Central system as exclusive
site disease in patients with melanoma: treatment and clinical outcome of two
cases. Melanoma Res. 2005 15 (2005) 467469
24. Marchetti, D., Denkins, Y., Reiland, J., Greiter-Wilke, A., Galjour, J., Murry,
B., Blust, J., Roy, M.: Brain-metastatic melanoma: a neurotrophic perspective.
Pathology Oncology Research 9 (2003) 147158
25. Buell, J., Gross, T., Alloway, R., Trofe, J., Woodle, E.: Central nervous system
tumors in donors: Misdiagnosis carries a high morbidity and mortality. Transplan-
tation Proceedings 37 (2005) 583584
26. Perou, C.M., Jerey, S.S., Rijn, M., Rees, C.A., Eisen, M.B., Ross, D.T., Perga-
menschikov, A., Williams, C.F., Zhu, S.X., Lee, J.C.F., Lashkari, D., Shalon, D.,
Brown, P.O., Botstein, D.: Distinctive gene expression patterns in human mam-
mary epithelial cells and breast cancers. Genetics 96 (1999) 92129217
Robust SVM-Based Biomarker Selection with
Noisy Mass Spectrometric Proteomic Data
1 Introduction
Feature selection (FS) for classication can be formulated as a combinatorial
optimization problem: nding the feature set maximizing the predictive perfor-
mance of the classier trained from these features. FS is a major research topic
in supervised learning and data mining [10, 16, 12]. For the sake of the learning
performance, it is highly desirable to discard irrelevant features prior to learn-
ing, especially when the number of available features signicantly outnumbers
the number of samples, like in biomedical studies. Because of its computational
intractability, the FS problem has been tackled by means of heuristic algorithms
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 7990, 2006.
c Springer-Verlag Berlin Heidelberg 2006
80 E. Marchiori et al.
based on statistics and machine learning [10, 20, 22]. Biological experiments from
laboratory technologies like microarray and mass spectrometry techniques, gen-
erate data with a very high number of variables (features), in general much
larger than the number of samples. Therefore FS provides a fundamental step in
the analysis of such type of data [27]. Ideally one would like to detect potential
biomarkers and biomarker patterns, that both highly discriminate diseased from
healthy samples and are biological interpretable. However, as substantiated in
recent publications like [19, 3, 21], reliability and reproducibility of results de-
pend on the particular way samples are handled [26], on the instability of the
laboratory technology, as well as on the specic techniques employed in the
computational analysis.
In this paper we consider FS for classication with MS proteomic data from
sera. Various machine learning and statistical techniques for feature selection
have been applied to proteomic data, like [15, 17, 8, 13, 5, 6, 7], in order to detect
potential tumor biomarkers for (early) cancer diagnosis (clinical proteomics). A
summary of actual challenges and critical assessment of clinical proteomics can
be found, e.g., in [21].
Here we propose a new method for FS with MS proteomic data. The goal is to
identify potential biomarker patterns that not only highly discriminate diseased
and healthy samples, but also are robust with respect to perturbation of the data.
The method consists of three main steps. First, a popular lter feature selection
algorithm, RELIEF, is used as pre-processing in order to reduce the number of
considered features. Next, multiple runs of linear SVM are considered, where at
each run a perturbed training set is used, obtained by changing the class label
of one support vector. Each run generates a large subset of selected features.
The frequency (over the runs) of selection of the features is used to choose the
most robust ones, namely those with highest frequency. Finally, the resulting
features are transformed into feature intervals, by considering the ordering of
the features, where neighbour features refer to peptides of similar masses.
The method generates a subset of feature intervals, where both the number
of intervals and features are automatically selected. These intervals describe
potential biomarker patterns.
We analyze experimentally the performance of the method on a real-life
dataset with controlled insertion of noisy samples (long storage time samples)
and relevant features (spiked molecules) [26]. The results indicate that the
method performs robust feature selection, by selecting features corresponding
to m/z measurements near to the (average of m/z values of the peak of the)
spiked molecules, and by misclassifying only 1 noisy sample (with long storage
time).
2 Background
This section describes in brief the Machine Learning techniques we use in the
proposed feature selection method.
Robust SVM-Based Biomarker Selection 81
with one constraint for each training sample xi . Usually the dual form of the
optimization problem is solved:
m m m
1
mini i j yi yj xi xj i
2 i=1 j=1 i=1
m
such that 0 i C, 2
i=1 i yi = 0. SVM requires O(m ) storage and O(m )
3
to solve.
m The resulting decision function f (x) = w x + b has weight vector w =
k=1 k yk xk . Samples xi for which i > 0 are called support vectors, since
they uniquely dene the maximum margin hyperplane. Samples with i C are
misclassied.
Maximizing the margin allows one to minimize bounds on generalization error.
Because the size of the margin does not depend on the input dimension, SVM
are robust with respect to data with high number of features. However, SVM
are sensitive to the presence of (potential) outliers, (cf. [11] for an illustrative
example) due to the regularization term for penalizing misclassication (which
depends on the choice of C).
SVMFS
%input: training set X, number of features
to be selected M
%output: subset Selected of M features
train linear classifier with SVM on X;
score features using the squared value of
the weights of the classifier;
Selected = M features with highest score;
return Selected;
RELIEF
RELIEF [23, 14] is a lter-based feature ranking algorithm that assigns a score
to features based on how well the features separate training samples from their
nearest neighbours from the same and from the opposite class.
The algorithm constructs iteratively a weight vector, which is initially equal
to zero. At each iteration, RELIEF selects one sample, adds to the weight the
dierence between that sample and its nearest sample from the opposite class
(called nearest miss), and subtracts the dierence between that sample and its
nearest neighbour from the same class (called nearest hit). The iterative process
terminates when all training samples have been considered. The resulting weight
of a feature is divided by its range of values (computed using only the training
set). Subsampling can be used to improve eciency in case of a large training
set. The pseudo-code of the RELIEF algorithm used in our experiments is given
below.
RELIEF
%input: training set X
%output: Ranking of features
nr_feat = total number of features;
Robust SVM-Based Biomarker Selection 83
0.9
spiked t=0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2000 4000 6000 8000 10000 12000 14000 16000
0.9
0.8
spiked t=48
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Fig. 1. A MALDI-TOF MS spiked sample for one person at storage duration time t=0
(top) and t=48 (bottom): x-axis contains (identiers of) the m/z values of peptides
and the y-axis their concentration
84 E. Marchiori et al.
4 The Method
FWI
%input: training set X
%number M of features to be selected by RELIEF
Robust SVM-Based Biomarker Selection 85
%FILTER step:
F = M features selected with RELIEF;
%WRAPPER step:
SV = set of support vectors obtained by training
SVM on X;
for x in SV
T = X with label of x changed;
F(x) = N features selected by SVMFS applied to T;
end;
count = maximum number of times that a feature
occurs in the sequence of F(x), x in SV;
W = features occurring count times in the sequence
of F(x), x in SV;
%INTERVAL step:
Cl = {C1, .., Cn} clustering of W, with
Ci = (w(1),.., w(ni))
w(1)<..< w(ni)
s.t. idx(w(j+1))-idx(w(j)) <= 2
for all j in [1..ni];
Int = {Int_1, .., Int_n} intervals from Cl, with
Int_i= {w in Features s.t. w >= min(Ci)
and w<= max(Ci)} for i in [1,n];
return Int;
Let us explain a bit in more detail the steps performed by FWI.
FWI starts by skimming the number of features, by applying the Filter
(F) step. Here RELIEF is employed in order to select M features. In the F
step one typically retains about M=5000 m/z measurements from the initial
22572.
In the Wrapper (W) step, robust wrapper based feature selection is per-
formed using the features that passed the Filter selection. In the Wrapper
(W) step, the support vectors of SVM trained on all the features are used for
perturbing the data. More precisely, multiple runs of SVMFS are performed,
where at each run the class label of one support vector is changed. Each run
generates a set of N features (typical value N=1000). The resulting sequence
of feature sets is then considered. The maximum number count of times a
feature occurs in the sequence is computed, and all features occurring count
times in the sequence are selected.
Finally, in the Interval (I) step, the selected m/z features are segmented as
follows. The sequence of features in W, ordered by m/z values, is segmented
86 E. Marchiori et al.
5 Numerical Experiments
1. Wrapper feature selection (W), obtained by applying the W step of the FWI.
2. Wrapper Interval (WI), obtained by applying steps W followed by I.
3. Feature Wrapper (FW), obtained by applying steps F followed by W.
4. The complete Feature Wrapper Interval algorithm FWI.
Because of the small size of the data, LOOCV is used for comparing the per-
formance of the four algorithms (cf., e.g., [9]). At each leave-one-out run, all but
one element of the data is used as training set, and the left-out element is used
for testing the predictive performance of the resulting classier. Observe that the
96 samples of the considered dataset are not independent one of the other, as
required for a correct application of LOOCV, because they are generated from 8
persons, and neither the 6 dierent storage times nor the spiking guarantee the
production of independent samples. Nevertheless, the corresponding bias intro-
duced in the LOOCV procedure aects the results of each algorithm, hence the
results can be used for comparing the performance of the algorithms. However,
such bias possibly aects the estimation of the generalization error.
Table 1 summarizes LOOCV performance results of the experiments. We use
accuracy, sensitivity and specicity as quality measures for comparing the algo-
rithms. Other measures, like AUC (Area Under the ROC Curve), can be used.
As illustrated e.g. in [1], there is a good agreement between accuracy and AUC
as to the ranking of the performance of the classication algorithms.
The results indicate that there is an improvement in predictive performance
of the four algorithms, with best accuracy achieved by FWI.
The misclassied samples over all the LOOCV runs have storage time equal
to 24 or 48 hours, indicating that longer storage time aects negatively classi-
cation of proteomic samples. Algorithm W misclassies a total of 5 samples, of
Robust SVM-Based Biomarker Selection 87
Table 1. Results: LOOCV sensitivity, specicity and accuracy (with standard devia-
tion between brackets)
100
nr loocv runs
90
80
70
60
50
40
20
10
spiked molecule
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
m/z measurements
0.8
0.7
typical pattern profile generated by FWI
0.6
0.5
0.4
0.3
mean normal
0.2
spiked molecule
0.1
mean spiked
0
1000 2000 3000 4000 5000 6000 7000 8000 9000
m/z values
Fig. 3. A typical m/z selection generated by FWI, the corresponding values of the
mean spiked and normal prole at the selected m/z values, and the spiked molecules
measurements in proximity of spiked molecules are more often selected over the
LOOCV runs, except for m/z measurements in the neighbourhood of 4000 and
5000, which do not correspond to m/z measurements of spiked molecules. In the
absence of additional information (e.g. tandem mass spectra yielding sequence
tags) it is dicult to know what these peak values represent. One possibility
is that the higher molecular weight spiked molecules are partially degraded in
serum, and these peaks are proteolytically cleaved peptides from larger proteins
(due to large storage time at room temperature) in the sample itself. However,
this possibility has not yet been examined in depth. Figure 3 shows a typical set
of m/z measurements generated by FWI, and the mean value of the intensities
of spiked and normal samples for the selected m/z measurements.
In conclusion, results indicate that FWI performs robust m/z selection, where
the selected features are close to the spiked molecules, and the misclassication
error is close to zero, with misclassication of only noisy (that is high storage
temperature) samples.
6 Conclusion
forming robust feature selection in the presence of noisy samples which perturb
the data and negatively aect sample classication. The W and I steps of the
proposed FWI method provide heuristics for tackling this problem.
This issue is related to broader questions about reproducibility and validity
of results in the discovery-based omics research [21, 24]. In a special session on
genomics of a recent issue of Science an essay entitled Getting the noise out of
gene arrays noted that [t]housands of papers have reported results obtained
using gene array ... But are these results reproducible? [19]. A controversy about
reprodicibility and validity of results from MS proteomic data is ongoing [3, 21]
and the path for achieving such ambitious goals appears still long.
Acknowledgments
We would like to thank the anonymous referees for their constructive comments,
and Jan Rutten for stimulating discussions.
References
1. A. P. Bradley. The use of the area under the ROC curve in the evaluation of
machine learning algorithms. Pattern Recognition, 30(6):11451159, 1997.
2. N. Cristianini and J. Shawe-Taylor. Support Vector machines. Cambridge Press,
2000.
3. E.P. Diamandis. Analysis of serum proteomic patterns for early cancer diagnosis:
Drawing attention to potential problems. Journal of the National Cancer Institute,
96(5):353356, 2004.
4. H.J. Issaq et al. SELDI-TOF MS for diagnostic proteomics. Anal. Chem.,
75(7):148A155A, 2003.
5. Petricoin E.F. et al. Serum proteomic patterns for detection of prostate cancer.
Journal of the National Cancer Institute, 94(20):15761578, 2002.
6. Petricoin E.F. et al. Use of proteomic patterns in serum to identify ovarian cancer.
The Lancet, 359(9306):5727, 2002.
7. Qu Y. et al. Boosted decision tree analysis of surface-enhanced laser desorp-
tion/ionization mass spectral serum proles discriminates prostate cancer from
noncancer patients. Clin. Chem, 48(10):183543, 2002.
8. Zhu W. et al. Detection of cancer-specic markers amid massive mass spectral
data. PNAS, 100(25):1466614671, 2003.
9. T. Evgeniou, M. Pontil, and A. Elissee. Leave one out error, stability, and gen-
eralization of voting combinations of classiers. Mach. Learn., 55(1):7197, 2004.
10. I. Guyon and A. Elissee. An introduction to variable and feature selection. Ma-
chine Learning, 3:11571182, 2003. Special Issue on variable and feature selection.
11. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classi-
cation using support vector machines. Mach. Learn., 46(1-3):389422, 2002.
12. George H. John, Ron Kohavi, and Karl Peger. Irrelevant features and the subset
selection problem. In International Conference on Machine Learning, pages 121
129, 1994.
13. K. Jong, E. Marchiori, M. Sebag, and A. van der Vaart. Feature selection in
proteomic pattern data with support vector machines. In IEEE Symposium on
Computational Intelligence in Bioinformatics and Computational Biology, 2004.
90 E. Marchiori et al.
14. K. Kira and L. A. Rendell. The feature selection problem: Traditional methods
and a new algorithm. In Tenth National Conference on artificial intelligence, pages
129134, 1992.
15. J. Li, Z. Zhang, J. Rosenzweig, Y.Y. Wang, and D.W. Chan. Proteomics and
bioinformatics approaches for identication of serum biomarkers to detect breast
cancer. Clinical Chemistry, 48(8):12961304, 2002.
16. H. Lie and editors H. Motoda. Feature Extraction, Construction and Selection:
a Data Mining Perspective. International Series in Engineering and Computer
Science. Kluwer, 1998.
17. H. Liu, J. Li, and L. Wong. A comparative study on feature selection and classi-
cation methods using gene expression proles and proteomic patterns. Genome
Informatics, 13:5160, 2002.
18. E. Marchiori, N.H.H. Heegaard, M. West-Nielsen, and C.R. Jimenez. Feature se-
lection for classication with proteomic data of mixed quality. In Proceedings of
the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and
Computational Biology, pages 385391, 2005.
19. E. Marshall. Getting the noise out of gene arrays. Science, 306:630631, 2004.
Issue 5696.
20. Il-Seok Oh, Jin-Seon Lee, and Byung-Ro Moon. Local search-embedded genetic
algorithms for feature selection. In 16 th International Conference on Pattern
Recognition (ICPR02). IEEE Press, 2002.
21. D.F. Ransoho. Lessons from controversy: Ovarian cancer screening and serum
proteomics. Journal of the National Cancer Institute, 97:315319, 2005.
22. M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, and A.K. Jain. Dimen-
sionality reduction using genetic algorithms. IEEE Transactions on Evolutionary
Computation, 4(2):164171, 2000.
23. L. A. Rendell and K. Kira. A practical approach to feature selection. In Interna-
tional Conference on machine learning, pages 249256, 1992.
24. Michiels S., Koscielny S., and Hill C. Prediction of cancer outcome with microar-
rays: a multiple random validation strategy. The Lancet, 365(9458):48892, 2005.
25. V.N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
26. M. West-Nielsen, E.V. Hogdall, E. Marchiori, C.K. Hogdall, C. Schou, and N.H.H.
Heegaard. Sample handling for mass spectrometric proteomic investigations of
human sera. Analytical Chemistry, 11(16):51145123, 2005.
27. E.P. Xing. Feature selection in microarray analysis. In A Practical Approach to
Microarray Data Analysis. Kluwer Academic, 2003.
28. L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlation-
based lter solution. In ICML, pages 856863, 2003.
On the Use of Variable Complementarity for
Feature Selection in Cancer Classification
1 Introduction
Statisticians and data-miners are used to build predictive models and infer de-
pendencies between variables on the basis of observed data. However, in a lot
of emerging domains, like bioinformatics, they are facing datasets characterized
by a very large number of features (up to several thousands), a large amount of
noise, non-linear dependencies and, often, only several hundreds of samples. In
this context, the detection of functional relationships as well as the design of ef-
fective classiers appears to be a major challenge. Recent technological advances,
like microarray technology, have made it possible to simultaneously interrogate
thousands of genes in a biological specimen. It follows that two classication
problems commonly encountered in bioinformatics are how to distinguish be-
tween tumor classes and how to predict the eects of medical treatments on
the basis of microarray gene expression proles. If we formalize this prediction
task as a supervised classication problem, we realize that we are facing a prob-
lem where the number of input variables, represented by the number of genes,
is huge (around several thousands) and the number of samples, represented by
the clinical trials, is very limited (around several tens). Because of well-known
numerical and statistical accuracy issues, it is typically necessary to reduce the
number of variables before starting a learning procedure. Furthermore, select-
ing features (i.e. genes) can increase the intelligibility of a model while at the
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 91102, 2006.
c Springer-Verlag Berlin Heidelberg 2006
92 P.E. Meyer and G. Bontempi
This paper deals with supervised multi-class classication. We will assume ei-
ther that all the variables are discrete or that they can be made discrete by a
quantization step. Hereafter, we will denote by Y the discrete output random
variable representing the class and by X the multi-dimensional discrete input
random variable.
In qualitative terms, feature selection boils down to select, among a set of po-
tential variables, the most relevant ones. At the same time it would be appealing
that these selected variables are not redundant. The notions of relevance and re-
dundancy can be made more formal thanks to the use of dependency measures
[2, 9].
Let us rst introduce some concepts of information theory:
Definition 3. Relevance.
Consider three random variables X,Y and Z and their joint probability distri-
bution pX,Y,Z (x, y, z). If H(Y |Z) = 0, then the variable relevance of X to Y
given Z, denoted by r(X; Y |Z), is zero. Else if H(Y |Z) = 0, then the variable
relevance of X to Y given Z is defined as:
I(X; Y |Z)
r(X; Y |Z) = (3)
H(Y |Z)
two variables are independent. We remark also that a negative value of the
complementarity can be taken as a measure of the redundancy of a pair of
variables for the task of predicting Y .
The example 2 is an illustration of complementarity between Xi and XS since
in that case:
I(Xi,S ; Y ) > I(Xi ; Y ) +I(XS ; Y ) (5)
0
without any gain in terms of bias reduction. On the contrary, two variables can
be complementary to the output (i.e. highly relevant together) while each of
them appears to be poorly relevant once taken individually (see Example 2 or
Example 3). As a consequence, these variables could be badly ranked, or worse
eliminated, by the ranking lter.
At each step, this method selects the variable which has the best trade-o
relevance-redundancy. This selection criterion is fast and ecient. At step d of
the forward search, the search algorithm computes n d evaluations where each
evaluation requires the estimation of (d + 1) bi-variate densities (one for each
already selected variables plus one with the output). It has been shown in [7] that
the MRMR criterion is an optimal rst order approximation of the conditional
relevance criterion. Furthermore, MRMR avoids the estimation of multivariate
densities by using multiple bivariate densities.
Note that, although the method aims to address the issue of redundancy
between variables through the term zi , it is not able to take into account the
complementarities between variables. This could be ineective in situations like
the one of Example 2 where, although the set {Xi , XS } has a large relevance to
Y , we observe that
1. the redundancy term zi is large due to the redundancy of Xi and XS
2. the relevance term ui is small since Xi is not relevant to Y .
it will not necessary select a variable complementary with the already selected
variables. Indeed, a variable that has a high complementarity with an already
selected variable will be characterized by a high conditional mutual information
with that variable but not necessarily by a high minimal conditional information
(see example 3).
In terms of complexity, note that at the dth step of the forward search, the
algorithm computes n d evaluations where each evaluation following CMIM
requires the estimation of d tri-variate densities (one for each previously selected
variable).
In the following chapter, we propose a new criterion that deals more explicitly
with complementary variables.
The theorem expresses that the mutual information of a subset S and a target
variable Y is lower bounded by the quantity L (I(XS ; Y )), that is the average of
the same quantity computed for all the sub-subsets XSi of XS .
In the following, we will use this theorem as a theoretical support to the
following heuristic: without any additional knowledge on how subsets of d vari-
ables should combine, the most promising subset is a combination of the best
performing subsets of (d 1 variables).
Replacing again the right-hand term by its lower bound and recursively until we
have subsets of two variables:
arg max I(XS(i,j) ; Y ) arg max I(Xi,j ; Y ) (14)
S S
iS jS iS jS
I(X, Y )
SR(X; Y ) = (15)
H(X, Y )
The main advantage in using this criterion for selecting variables is that a
complementary variable of an already selected one has a much higher probability
to be selected than with other criteria. As this criterion measures symmetrical
relevance on all the combination of two variables (double input) of a subset, we
have called the criterion: the double input symmetrical relevance (DISR). At the
dth step of the forward search, the search algorithm computes n d evaluations
where each evaluation requires the estimation of d tri-variate densities (one for
each previously selected variable). In the next section, the DISR criterion is
assessed and compared with the other heuristic search lters discussed in the
Section 3.
Table 1 summarizes the methods discussed so far in terms of some peculiar
aspects: the capacity of selecting relevant variables, of avoiding redundancy, of
selecting complementary features and of avoiding the computation of multivari-
ate densities.
5 Experiments
A major topic in bioinformatics is how to build accurate classiers for cancer di-
agnostic and prognostic purposes on the basis of microarray genomic signatures.
This task can be considered as a challenging benchmark for feature selection
algorithms [7] given the high feature to sample ratio.
We use eleven public domain multi-class datasets from [14] (Table 2) in order
to assess and compare our technique with the state-of-the-art approaches.
In our experimental framework, each continuous variable has been discretized
in equal sized interval. The number of intervals of each input is based on the Scott
criterion, see [15]. All the datasets are partitioned into two parts: a selection set
and a test set (each having size equal to N/2). We compare the lter based
on DISR with the four state-of the art approaches discussed above: a Ranking
algorithm and three lters based on the Relevance criterion, the Minimum Re-
dudancy Maximum Relevance criterion and the Conditional Mutual Information
Table 3. Statistically (0.1 level and 0.2 level of a paired two-tailed t-test) signicant
wins, ties or losses over best rst search combined with DISR criterion
W/T/L VS DISR Rank REL CMIM MRMR Rank REL CMIM MRMR
3-NN 0/9/2 1/8/2 1/7/3 2/7/2 1/5/5 1/7/3 1/7/3 2/7/2
Naive Bayes 3/8/0 2/7/2 1/8/2 1/9/1 4/7/0 3/6/2 1/8/2 1/9/1
SVM 2/9/0 2/6/3 2/6/3 2/8/1 2/6/3 3/4/4 2/5/4 3/5/3
the criterion in a wider range of domains, (iii) the impact of the discretization
method to the eciency of the feature selection algorithms.
References
1. Guyon, I., Elissee, A.: An introduction to variable and feature selection. Journal
of Machine Learning Research 3 (2003) 11571182
2. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Articial Intelligence
97 (1997) 273324
3. Blum, A., Langley, P.: Selection of relevant features and examples in machine
learning. Articial Intelligence 97 (1997) 245271
4. Provan, G., Singh, M.: Learning bayesian networks using feature selection. In:
in Fifth International Workshop on Articial Intelligence and Statistics. (1995)
450456
5. Duch, W., Winiarski, T., Biesiada, J., Kachel, A.: Feature selection and ranking
lters. In: International Conference on Articial Neural Networks (ICANN) and
International Conference on Neural Information Processing (ICONIP). (2003) 251
254
6. Bell, D.A., Wang, H.: A formalism for relevance and its application in feature
subset selection. Machine Learning 41 (2000) 175195
7. Peng, H., Long, F.: An ecient max-dependency algorithm for gene selection.
In: 36th Symposium on the Interface: Computational Biology and Bioinformatics.
(2004)
8. Fleuret, F.: Fast binary feature selection with conditional mutual information.
Journal of Machine Learning Research 5 (2004) 15311555
9. Yu, L., Liu, H.: Ecient feature selection via analysis of relevance and redundancy.
Journal of Machine Learning Research 5 (2004) 12051224
10. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, New
York (1990)
11. Yang, H., Moody, J.: Feature selection based on joint mutual information. In: In
Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Meth-
ods and Applications (CIMA), Rochester New York, ICSC (1999)
12. Kojadinovic, I.: Relevance measures for subset variable selection in regression
problems based on k-additive mutual information. Computational Statistics and
Data Analysis 49 (2005) 12051227
13. Meyer, P.: Information theoretic lters for feature selection. Technical report,
Universite Libre de Bruxelles ((548) 2005)
14. web: (http://www.tech.plym.ac.uk/spmc/bioinformatics/microarray cancers.html)
15. Scott, D.W.: Multivariate Density Estimation. Theory,. Wiley (1992)
16. R-project: (www.r-project.org)
Comparison of Neural Network Optimization
Approaches for Studies of Human Genetics
Center for Human Genetics Research, Department of Molecular Physiology & Biophysics,
Vanderbilt University, Nashville, TN, 37232, USA
{motsinger, dudek, hahn, ritchie}@chgr.mc.vanderbilt.edu
http://chgr.mc.vanderbilt.edu/ritchielab
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 103 114, 2006.
Springer-Verlag Berlin Heidelberg 2006
104 A.A. Motsinger et al.
architecture should be for a given dataset and, as a result, a cumbersome trial and
error approach is often taken.
Previously, we implemented a neural network optimized via genetic programming
(GPNN)[22]. Optimizing neural network architecture with genetic programming was
first proposed by Koza and Rice[23]. We implemented and extended the GPNN
approach for use in association studies of human disease. The goal of GPNN was to
improve upon the trial-and-error process of choosing an optimal architecture for a
pure feed-forward back propagation neural network[22]. GPNN optimizes the inputs
from a large pool of variables, the weights, and the connectivity of the network -
including the number of hidden layers and the number of nodes in the hidden layer.
Thus, the algorithm automatically generates optimal neural network architecture for a
given dataset. This gives it an advantage over the traditional back propagation NN, in
which the inputs and architecture are pre-specified and only the weights are
optimized.
GPNN was a successful endeavor it has shown high power to detect gene-gene
interactions in both simulated and real data[24]. Still, there are limitations to evolving
NN using this type of machine learning algorithm. First, the GP implementation that
was used for GPNN involves building binary expression trees. Therefore, each node
is connected to exactly two nodes at the level below it in the network. This did not
seem to hinder the power of GPNN in smaller datasets[22,24-26]; however, we
hypothesize that for more complex data, more complicated NN will be required, and
two connections per node may not be sufficient. Second, changes to GPNN require
altering and recompiling source code, which hinders flexibility and increases
development time. For example, GPNN is limited in the depth of the network. This
means there is a limit to the number of levels the network can contain. Again, this
was not a hindrance for GPNN in the previous power studies[22,24-26], but this may
not scale well for more complex datasets.
In response to these concerns, we developed a NN approach for detecting gene-
gene interactions that uses grammatical evolution (GE) as a strategy for the
optimization of the NN architecture. Grammatical evolution (GE) is a variation on
genetic programming that addresses some of the drawbacks of GP[27,28]. GE has
been shown to be effective in evolving Petri Nets, which are discrete dynamical
systems that look structurally similar to neural networks, used to model biochemical
systems[29]. By using a grammar, substantial changes can be made to the way that
NN are constructed through simple manipulations to the text file where the grammar
is specified. No changes in source code are required and thus, there is no
recompiling. The end result is a decrease in development time and an increase in
flexibility. These two features are important improvements over GPNN.
Preliminary studies with GPNN show that an evolutionary optimization is more
powerful than traditional approaches for detecting gene-gene interactions. We have
shown that the GPNN strategy is able to model and detect gene-gene interactions in
the absence of main effects in many epistasis models with higher power than back
propagation NN[22], stepwise logistic regression[26], and a stand alone GP[25].
GPNN has also detected interactions in a real data analysis of Parkinsons
disease[24]. Similarly to GPNN, the grammatical evolution optimized neural
network (GENN) optimizes the inputs from a pool of variables, the synaptic weights,
and the architecture of the network. The algorithm automatically selects the
Comparison of NN Optimization Approaches for Studies of Human Genetics 105
2 Methods
Fig. 1. An overview of the GENN method. The steps correspond to the description of the
method in Section 2.1.
Classification error is calculated on the set of training data as the fitness metric.
As mentioned earlier, the dataset is divided into cross-validation subsets. GENN is
optimized using a training set of data, and a subset of the data is left out as a test set
to evaluate the final solution and prevent over-fitting. Classification error refers to
the number of samples in the training dataset that are incorrectly classified by the
network. Prediction error, which refers to the number of samples in the test dataset
that are incorrectly classified using the GENN model generated during training, is
used for final model selection. The overall goal of the learning process is to find
genetic models that accurately classify the data. Cross-validation is used in
conjunction with this learning process to produce a model that not only can
accurately classify the data at hand, but can predict on future, unseen data.
GPNN uses genetic programming to determine the optimal architecture for neural
networks. Both the method and the software have previously been described[22].
GPNN was applied as presented in the references. Like GENN, models are trained on
classification error, and a cross validation consistency and prediction error are
determined for the final model. Unlike GENN, previous studies have shown cross
validation consistency as the best criterion for final model selection. However for this
study, the results are identical whether you use cross-validation consistency or
prediction error for final model selection. All configuration parameters are identical to
those in GENN. This will allow for a direct comparison of GPNN and GENN.
108 A.A. Motsinger et al.
Epistasis, or gene-gene interaction, occurs when the phenotype under study cannot be
predicted from the independent effects of any single gene, but is the result of
combined effects of two or more genes[34]. It is increasingly accepted that epistasis
plays an important role in the genetic architecture of common genetic diseases[35].
Penetrance functions are used to represent epistatic genetic models in this simulation
study. Penetrance defines the probability of disease given a particular genotype
combination by modeling the relationship between genetic variations and disease risk.
For our power studies, we simulated case-control data using two different epistasis
models exhibiting interaction effects in the absence of main effects. Models that lack
main effects are desirable because they challenge the method to find gene-gene
interactions in a complex dataset. Also, a method able to detect purely interactive
terms will be likely to identify main effects as well.
Comparison of NN Optimization Approaches for Studies of Human Genetics 109
Table 1. Multilocus penetrance functions used to simulate case-control data exhibiting gene-
gene interactions in the absence of main effects. Penetrance is calculated as
p(disease|genotype). Marginal penetrance values (not shown) are all equal for each model.
a. Model 1 b. Model 2
BB Bb bb BB Bb bb
AA 0 .10 0 AA 0 0 .10
Aa .10 0 .10 Aa 0 .50 0
aa 0 .10 0 Aa .10 0 0
To evaluate the power of the above methods for detecting gene-gene interactions,
we simulated case-control data using two different two-locus epistasis models in which
the functional loci are single nucleotide polymorphisms (SNPs). The first model was
initially described by Li and Reich[36], and later by Moore [37]. This model is based
on the nonlinear XOR function[38] that generates an interaction effect in which high
risk of disease is dependent on inheriting a heterozygous genotype (Aa) from one locus
or a heterozygous genotype (Bb) from a second locus, but not both. The high risk
genotypes are AaBB, Aabb, AABb, and aaBb, all with penetrance of 0.1 (Table 1a).
The proportion of the trait variance that is due to genetics, or heritability, of this model
is low. Specifically, as calculated according to Culverhouse et al[39], the heritability is
0.053. The second model was initially described by Frankel and Schork[40], and
later by Moore[37]. In this second model, high risk of disease is dependent on
inheriting exactly two high risk alleles (A and/or B) from two different loci. In this
model, the high risk genotypes were AAbb, AaBb, and aaBB, with penetrance of 0.1,
0.5, and 0.1 respectively (Table 1b). The heritability of this model is 0.051.
These models were selected because they exhibit interaction effects in the absence
of any main effects when genotypes were generated according to Hardy-Weinberg
proportions (in both models, p=q=0.5). For both models, we simulated 100 datasets
consisting of 200 cases and 200 controls, each with 10 SNPs, 2 of which were
functional. The data were generated using the software package described by Moore
et al[37]. Dummy variable encoding was used for each dataset, where n-1 dummy
variables were used for n levels[19]. Data were formatted with rows representing
individuals and columns representing dummy-encoded genotypes with the final
column representing disease status. Though biological relevance of these models is
uncertain, they do represent a worst case scenario in the detection of epistasis. If a
method performs well under such minimal effects, it is predicted it will also perform
well in identifying gene-gene interactions in models with greater effect sizes.
We used all four methods (GENN, GPNN, BPNN and random search) to analyze both
epistasis models. The configuration parameter settings were identical for GENN,
GPNN and the random search (without evolutionary operators for random search): 10
demes, migration every 25 generations, population size of 200 per deme, 50
generations, crossover rate of 0.9, and a reproduction rate of 0.1. For GENN and the
random search, prediction error was used for final model selection as described in
Section 2.1. For GPNN, cross-validation consistency was calculated for each model
110 A.A. Motsinger et al.
and the final model was selected based on this metric (as described in [22]). Sensible
initialization was used in all three algorithms.
For our traditional BPNN analysis, all possible inputs were used and the
significance of each input was calculated from its input relevance R_I, where R_I is
the sum of squared weights for the ith input divided by the sum of squared weights
for all inputs[38]. Next, we performed 1000 permutations of the data to determine
what input relevance was required to consider a SNP significant in the BPNN model
(data not shown). This empirical range of critical relevance values for determining
significance was 10.43% - 11.83% based on the permutation testing experiments.
Cross validation consistency was also calculated and an empirical cutoff for the cross
validation consistency was determined through permutation testing (using 1000
randomized datasets). This cutoff was used to select SNPs that were functional in the
epistasis model for each dataset. A cross validation consistency of greater than 5 was
required to be statistically significant.
Power for all analyses is reported under each epistatic model as the number of
times the algorithm correctly identified the correct functional loci (both with and
without any false positive loci) over 100 datasets. Final model selection was
performed for each method based on optimum performance in previous studies[22].
If either one or both of the dummy variables representing a single SNP was selected,
that locus was considered present in the model.
3 Results
Table 2 lists the power results from all four algorithms. Because of the small size of
the dataset, all four algorithms performed reasonably well. With a limited number of
SNPs, these learning algorithms can effectively become exhaustive searches. As
hypothesized, GENN and GPNN both out-performed the traditional BPNN and the
random search. The performance of GENN and GPNN were consistent, as expected.
This demonstrates that GENN will work at least as well as GPNN, while allowing for
faster development and more flexible use. Because the number of variables included
in the dataset was small, the random search performed reasonably well, as the trial
and error approach had a limited number of variables to search through. As Table 2
shows, there is a large gap in the performance of the random search between Model 1
and Model 2. This is probably due to the difference in the difficulty inherent in the
two models. The power of BPNN to detect Model 2 was also lower than for Model 1,
indicating a difference in the challenge of modeling the different models.
Additionally, the stochastic nature of a random algorithm can lead to erratic results, as
shown here. These erratic power results further demonstrate the utility of an
evolutionary approach to optimizing NN architecture. The random search even
outperformed BPNN for Model 1, probably because the random search was able to
search through more possible NN architectures than BPNN so was able to find the
correct model more often in these simulations.
Table 3 summarizes the average classification error (training error) and prediction
error (testing error) for the four algorithms evaluated using the 100 datasets for each
model. Due to the probabilistic nature of the functions used in the data simulation,
Comparison of NN Optimization Approaches for Studies of Human Genetics 111
Table 2. Power (%) for each method on both gene-gene interaction models (with no false
positive loci)
Epistasis Model GENN GPNN BPNN Random Search
1 100 100 53 87
2 100 100 42 10
Table 3. Results for all four algorithms, demonstrating average classification error (CE) and
prediction error (PE) for each epistasis model. The range of observed observations is listed
below the average.
Table 4. Power (%) for each method to detect functional SNPs in both gene-gene interaction
models (with or without false positive loci)
there is some degree of noise present in the data. The average error inherent in the
100 Model 1 datasets is 24%, and the error in the Model 2 datasets is 18%. As the
table shows, GENN, GPNN and the random search all had error rates closely
reflecting the real amount of noise in the data. Those three algorithms had lower
prediction errors than BPNN, while BPNN had lower classification errors for both
models. The lower classification error is due to model over-fitting. The other three
algorithms, including even the random search, are better able to model gene-gene
interaction and develop NN models that can generalize to unseen data. While the
random search did not demonstrate the same degree of over-fitting experienced with
BPNN, the averages reported here disguise the fact that the range of errors across
datasets was very high. While the average errors for the random search look similar
to those for GPNN and GENN, the range of observed values was much larger,
implying that the random search also tends to over-fit. We speculate that GENN and
GPNN are not over-fitting because, while these methods are theoretically able to
build a tree with all variables included, neither method is building a fully connected
NN using all variables.
To further understand the behavior of the algorithms, and in particular the
seemingly inconsistent results of the random searchs average errors and low relative
power, we calculated power for each functional locus as the proportion of times a
SNP was included in the final model (regardless of what other SNPs are present in
the model) for all datasets. Table 4 lists the results of this power calculation for all
112 A.A. Motsinger et al.
four methods. The tendency of the random search algorithm to over-fit models
becomes clear in the comparisons of Tables 2-4. The random search finds the
functional SNPs, but includes many false positive loci, which is highly undesirable
since the end goal of association analysis is variable selection. The same false
positive trend holds true for BPNN.
4 Discussion
We have demonstrated that grammatical evolution is a valid approach for optimizing
the architecture of NN. We have shown that GENN outperforms both a random
search and traditional BPNN for analysis of simulated epistasis genetic models.
Because of the small number of SNPs in this study, both BPNN and the random
search NN had modest power, as one would expect. With a small number of
variables to test, examining low-order combinations is relatively easy. As the number
of variables increases, the resultant combinatorial explosion limits the feasibility of
trial and error approaches. Moody[33] demonstrates that enumeration of all possible
NN architectures is impossible, and there is no way to know if a globally optimal
architecture is selected. The performance gap between the evolutionarily optimized
algorithms and the trial and error approaches is expected to widen as the number of
variables increases.
Additionally, we show that GENN performs at least as well as GPNN. Because of
the limited number of noise variables, and the fact that these two methods reached
the upper limit of power, a more extensive comparison between GENN and GPNN
needs to be performed. Power will need to be studied in a range of datasets,
demonstrating a wide range of heritability values and number of noise variables.
Because of the greater flexibility of GE compared to GP, we predict that GENN will
out-perform GPNN on more complex datasets.
Because the end-goal of these methods is variable selection, performance has been
evaluated according to this metric in this study. In future studies, it would be
interesting to evaluate the architectures of the NN that are constructed by these
different methods to further evaluate the differences in their performance. Other
measures of model fitness, such as sensitivity and specificity could also be dissected
in evaluating the performance of GENN.
Also, while simulated data are necessary in method development, the eventual
purpose of this method is for the analysis of real data. GENN will need to be tested
on real case-control genetic data.
This study introduces a novel computational method and demonstrates that GENN
has the potential to mature into a useful software tool for the analysis of gene-gene
interactions associated with complex clinical endpoints. The ease of flexibility and
ease of development of utilizing a grammar will aid in additional studies with this
method.
Acknowledgements
This work was supported by National Institutes of Health grants HL65962, GM62758,
and AG20135. We would also like to thank David Reif for his helpful comments on
the manuscript.
Comparison of NN Optimization Approaches for Studies of Human Genetics 113
References
1. Kardia S, Rozek L, Hahn L, Fingerlin T, Moore J: Identifying multilocus genetic risk
profiles: a comparison of the multifactor data reduction method and logistic regression.
Genetic Epidemiology 2000.
2. Moore JH, Williams SM: New strategies for identifying gene-gene interactions in
hypertension. Ann Med 2002, 34: 88-95.
3. Culverhouse R, Klein T, Shannon W: Detecting epistatic interactions contributing to
quantitative traits. Genet Epidemiol 2004, 27: 141-152.
4. Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for
detecting gene-gene and gene-environment interactions. Bioinformatics 2003, 19: 376-382.
5. Kooperberg C, Ruczinski I, LeBlanc ML, Hsu L: Sequence analysis using logic
regression. Genet Epidemiol 2001, 21 Suppl 1: S626-S631.
6. Moore JH: The ubiquitous nature of epistasis in determining susceptibility to common
human diseases. Hum Hered 2003, 56: 73-82.
7. Nelson MR, Kardia SL, Ferrell RE, Sing CF: A combinatorial partitioning method to
identify multilocus genotypic partitions that predict quantitative trait variation. Genome
Res 2001, 11: 458-470.
8. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF et al.: Multifactor-
dimensionality reduction reveals high-order interactions among estrogen-metabolism
genes in sporadic breast cancer. Am J Hum Genet 2001, 69: 138-147.
9. Ritchie MD, Hahn LW, Moore JH: Power of multifactor dimensionality reduction for
detecting gene-gene interactions in the presence of genotyping error, missing data,
phenocopy, and genetic heterogeneity. Genet Epidemiol 2003, 24: 150-157.
10. Tahri-Daizadeh N, Tregouet DA, Nicaud V, Manuel N, Cambien F, Tiret L: Automated
detection of informative combined effects in genetic association studies of complex traits.
Genome Res 2003, 13: 1952-1960.
11. Zhu J, Hastie T: Classification of gene microarrays by penalized logistic regression.
Biostatistics 2004, 5: 427-443.
12. Schalkoff R: Artificial Neural Networks. New York: McGraw-Hill Companies Inc.; 1997.
13. Bhat A, Lucek PR, Ott J: Analysis of complex traits using neural networks. Genet
Epidemiol 1999, 17: S503-S507.
14. Curtis D, North BV, Sham PC. Use of an artificial neural network to detect association
between a disease and multiple marker genotypes. Annals of Human Genetics 65, 95-107.
2001.
15. Li W, Haghighi F, Falk C: Design of artificial neural network and its applications to the
analysis of alcoholism data. Genet Epidemiol 1999, 17: S223-S228.
16. Lucek P, Hanke J, Reich J, Solla SA, Ott J: Multi-locus nonparametric linkage analysis of
complex trait loci with neural networks. Hum Hered 1998, 48: 275-284.
17. Lucek PR, Ott J: Neural network analysis of complex traits. Genet Epidemiol 1997, 14:
1101-1106.
18. Marinov M, Weeks D: The complexity of linkage analysis with neural networks. Human
Heredity 2001, 51: 169-176.
19. Ott J. Neural networks and disease association. American Journal of Medical Genetics
(Neuropsychiatric Genetics) 105[60], 61. 2001.
20. Saccone NL, Downey TJ, Jr., Meyer DJ, Neuman RJ, Rice JP: Mapping genotype to
phenotype for linkage analysis. Genet Epidemiol 1999, 17 Suppl 1: S703-S708.
21. Sherriff A, Ott J: Applications of neural networks for geen finding. Advances in Genetics
2001, 42: 287-297.
114 A.A. Motsinger et al.
22. Ritchie MD, White BC, Parker JS, Hahn LW, Moore JH: Optimization of neural network
architecture using genetic programming improves detection and modeling of gene-gene
interactions in studies of human diseases. BMC Bioinformatics 2003, 4: 28.
23. Koza J, Rice J: Genetic generation of both the weights and architecture for a neural
network. IEEE Transactions 1991, II.
24. Motsinger AA, Lee S, Mellick G, Ritchie MD: GPNN: Power studies and applications of
a neural network method for detecting gene-gene interactions in studies of human disease.
BMC Bioinformatics 2005, in press.
25. Bush WS, Motsinger AA, Dudek SM, Ritchie MD: Can neural network constraints in GP
provide power to detect genes associated with human disease? Lecture Notes in Computer
Science 2005, 3449: 44-53.
26. Ritchie MD, Coffey CSMJH: Genetic programming neural networks: A bioinformatics
tool for human genetics. Lecture Notes in Computer Science 2004, 3102: 438-448.
27. O'Neill M, Ryan C. Grammatical Evolution. IEEE Transactions on Evolutionary
Computation 5, 349-357. 2001.
28. O'Neill M, Ryan C. Grammatical evolution: Evolutionary automatic programming in an
arbitrary language. 2003. Boston, Kluwer Academic Publishers.
29. Moore JH, Hahn LW: Petri net modeling of high-order genetic systems using grammatical
evolution. BioSystems 2003, 72: 177-186.
30. Mitchell M. An introduction to genetic algorithms. 1996. Cambridge, MIT Press.
31. Cantu-Paz E. Efficient and accurate parallel genetic algorithms. 2000. Boston, Kluwer
Academic Publishers.
32. Utans J, Moody J. Selecting neural network architectures via the prediction risk
application to corporate bond rating prediction. 1991. Los Alamitos, California, IEEE
Press. Conference Proceedings on the First International Conference on Artificial
Intelligence Applications on Wall Street.
33. Moody J: Prediction risk and architecture selection for neural networks. In From Statistics
to Nerual Networks: Theory and Pattern Recognition Applications. Edited by Cherkassky
V, Friedman JH, Wechsler H. NATO ASI Series F, Springer-Verlag; 1994.
34. Fahlman SE, Lebiere C: The Cascade-Correlation Learning Architecture. Carnegie
Mellon University; 1991. Masters from School of Computer Science.
35. Templeton A. Epistasis and complex traits. Wade, M., Broadie, B III, and Wolf, J. 41-57.
2000. Oxford, Oxford University Press. Epistasis and the Evolutionary Process.
36. Li W, Reich J. A complete enumeration and classification of two-locus disease models.
Hum.Hered. 50, 334-349. 2000.
37. Moore J, Hahn L, Ritchie M, Thornton T, White B. Application of genetic algorithms to
the discovery of complex models for simulation studies in human genetics. Langdon, WB,
Cantu-Paz, E, Mathias, K, Roy, R, Davis, D, Poli, R, Balakrishnan, K, Honavar, V,
Rudolph, G, Wegener, J, Bull, L, Potter, MA, Schultz, AC, Miller, JF, Burke, E, and
Jonoska, N. Proceedings of the Genetic and Evolutionary Algorithm Conference. 1150-
1155. 2002. San Francisco, Morgan Kaufman Publishers.
38. Anderson J: An Introduction to Neural Networks. Cambridge, Massachusetts: MIT Press;
1995.
39. Culverhouse R, Suarez BK, Lin J, Reich T: A perspective on epistasis: limits of models
displaying no main effect. Am J Hum Genet 2002, 70: 461-471.
40. Frankel W, Schork N. Who's afraid of epistasis? Nat.Genet. 14, 371-373. 1996.
Obtaining Biclusters in Microarrays
with Population-Based Heuristics
1 Introduction
One of the research elds which has aroused the greatest interest towards the
end of the 20th century and whose future is expected to be as equally promising
in the 21st century is the study of an organisms genome or genomics.
By way of a brief history, it was Gregor Mendel who dened the gene concept
in his research as the element where information about hereditary characteristics
is to be found. At a later stage, Avery, McCleod and McCarty demonstrated that
an organisms genetic information stems from a macromolecule called deoxyri-
bonucleic acid (DNA); it was later discovered that genetic information located
in specic areas of the DNA (the genes) enabled protein synthesis; this was fol-
lowed by the sequencing of the genome of certain organisms (including humans).
This and future consequences awakened a great deal of interest among scientists.
Since proteins are responsible for carrying out cellular functions, cellular func-
tioning therefore depends on the proteins synthesized by the genes, and is de-
termined by regulation of protein synthesis (gene expression) and control of its
activity.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 115126, 2006.
c Springer-Verlag Berlin Heidelberg 2006
116 P. Palacios, D. Pelta, and A. Blanco
The process whereby the approximately 30,000 genes in the human genome
are expressed as proteins involves two steps: 1) the DNA sequence is transcribed
in messenger RNA sequences (mRNA); and 2) the mRNA sequences are in turn
translated into amino acid sequences which comprise the proteins.
Measuring the mRNA levels provides a detailed vision of the subset of genes
which are expressed in dierent types of cells under dierent conditions. Mea-
suring these levels of gene expression under dierent conditions helps explore the
following aspects (among others) in greater depth: a) The function of the genes,
b) How several genes interact and c) How dierent experimental treatments
aect cell function.
Recent advances in array-based methods enable expression levels of thou-
sands of genes to be measured simultaneously. These measurements are obtained
by quantizing the mRNA hybridization with a cDNA array, or oligonucleotide
probes xed in a solid substance.
Technological advances in the development of cDNA arrays simultaneously
produce an amazingly large quantity of data relating to the transcription levels of
thousands of genes and in specic conditions. For knowledge extraction (function
of the genes, implication of certain genes in specic illnesses, etc.), researchers
use consolidated methodologies and specic ones are being developed. However,
although the results obtained so far are getting better, there is still room for
improvement.
In a gene expression matrix, the rows represent genes and the columns represent
samples, and each cell contains a number which characterizes the expression level
of a particular gene in a particular sample.
Like most experimental techniques, microarrays measure the nal objective
indirectly through another physical quantity, for example the relative abundance
of mRNA through the uorescence intensity of the spots in an array.
Microarray-based techniques are still a long way from providing the exact
quantity of mRNA in a cell. The measurements are naturally relative: essen-
tially we can compare the expression levels of one gene in dierent samples or
dierent genes in one sample, so that it is necessary to apply a suitable nor-
malization to enable comparisons between data. Moreover, as the value of the
microarray-based gene expression can be considerably greater according to the
reliability and limitations of a particular microarray technique for certain types
of measurements, data normalization is a key issue to consider.
Once we have constructed the gene expression matrix, the second step is to
analyze it and attempt to obtain information from it.
In this work we shall use the biclustering concept introduced by Hartigan [6]
to capture the degree of similarity between a subset of elements within a subset
of attributes. Church applied this technique on DNA microarrays [3].
The advantage of biclustering as opposed to traditional clustering when ap-
plied to the eld of microarrays lies in its ability to identify groups of genes
Obtaining Biclusters in Microarrays with Population-Based Heuristics 117
that show similar activity patterns under a specific subset of the experimental
conditions. Therefore, biclustering approaches are the key technique to use when
one or more of the following situations applies [7]:
when two adjacent windows are similar, they merge and a new one is created.
The algorithm uses the parallelism to reduce the computation time.
The last years have shown an increasing interest in this eld. We suggest the
interested reader to check out the excellent survey by Madeira and Oliveira [7].
Definition: Given a gene expression matrix Dnm , a bicluster is a pair (I, J),
where I {1, ..., n} is a subset of rows of F and J {1, ..., m} is a sub-
set of columns of R, in which the genes Gi with i I behave in a similar way.
Definition: Given a bicluster (I, J), the residue (rij ) of an element dij of the
bicluster is calculated according to Equation 1.
them, which in particular would consist in exchanging with each other k bits
of a given solution. If this exchange leads to an improvement, a new k-opt
movement would be undertaken and this process would be continued until
no improvement is made in certain number of trials.
We constructed 4 dierent memetic schemes for values of k {2, 3, 4, 5}.
Taboo Search (TS): a very basic TS strategy is used. The neighborhood
is sampled using the mutation operator described for the GA.
1. Choose the M best individuals in the current population which are in the
Se
set Dl1 .
2. Estimate the probability distribution of the current population using Eq. 7.
3. Generate a new population of N individuals from the probability distribution
which is stored in the set DlSE .
Se
where j (Xi = xi |Dl1 ) is 1 for the j th individual in M if the value of gene Xi
is equal to xi , in any other case it will be 0.
The outline of our UMDA-based algorithm can be seen in Algorithm 1.
4 Experiments
In order to evaluate and analyze the implemented algorithms, the yeast expres-
sion data set has been used, comprising 17 experiments (columns) on 2900 genes
(rows). This gene expression data set was chosen since it is one of the most used
in literature by the majority of experts in this eld, thereby enabling our results
to be compared.
The results obtained with the proposed tools have been compared using the
algorithm proposed by Church in [3] as a reference algorithm.
Following an empirical study, the following parameters were xed: a popula-
tion of 200 individuals and 200 generations. The crossover and mutation proba-
bilities were xed at 0.8 and 0.6, respectively.
Each algorithm was executed 30 times, and the seed of the random number
generator was changed in each execution. At the end of each execution, the best
bicluster found was recorded.
4.1 Results
This section includes the results obtained by the proposed algorithms and the
reference algorithm. We also performed a random sampling of biclusters in order
to check the expected residue value for a random bicluster.
Table 1 shows the corresponding residues for the best and worst biclusters,
the mean and typical deviation on 30 executions, and also an indication of the
time taken for each execution. The average size of the resulting biclusters are
also displayed. Results for Reference algorithm were taken over 100 bicluster
while 104 random solutions were generated.
122 P. Palacios, D. Pelta, and A. Blanco
Figure 1 shows the histograms of residue (a), rows (b) and columns (c) of
the best biclusters found by every algorithm. Results for k-opt for k 3 were
omitted for visualization purposes (although they were extremely similar to those
of 2-opt). These gures give us a global view of the whole set of best biclusters
available, enabling to quickly check out the region of the search space covered
by every strategy.
Several aspects can be highlighted from the table and the histograms. First
one is that the GA achieves very good residue values despite the simplicity of its
components denitions. On average, the biclusters generated are quite big (825
rows and 8 columns).
The joint use of GA with local search, leading to dierent memetic schemes,
does not seem to be useful. The simpler local search schemes (k-opt) increase
the residue while the average number of rows in the bicluster is signicantly
reduced. In this case, and given the standard deviation values, it is clear that
there is a problem of convergence which is independent of the k value. Moreover,
no statistical dierences were detected for dierent values of k.
As the complexity of the local search is increased, from 2-opt to TS, the
residue values also increase. This becomes clear if we look at the corresponding
histogram. In turn, the sizes of the biclusters obtained are slightly higher than
those obtained by k-opt.
The EDA strategy achieves the lowest average residue value, while the corre-
sponding bicluster sizes are about 200 rows and 8 columns. The average residue
for the reference algorithm is almost three times higher than that of EDA, while
the biclusters are smaller on average(although the number of columns is in-
creased from 8 to 12). The reference algorithm presents the highest variability
in residue, number of rows and columns (this is clearly seen in the histograms).
In order to determine what dierences in terms of residue are signicant, a
Kruskal-Wallis test was performed. The test reveals signicant dierences among
the median of the residues of the algorithms. Then, pairwise U Man-Witney non
parametrical test were performed and they conrm that the dierences among
the algorithms were signicant.
Another element to analyze is the computational time used. The faster al-
gorithm, and the best one on bicluster quality, is EDA, followed by the GA.
The addition of local search to GA increases the computational time consider-
ably while not having the same counterpart in biclusters quality. The Churchs
algorithm has quite acceptable running times.
In Fig. 2 we plot the volume of the best solutions (calculated as rows
columns) against residue for algorithms GA, EDA and Reference). This plot re-
veals several things. First one is the existence of many alternative solutions with
similar residue. See for example the vertical range for residue between 5-10. This
fact is most notably for GA and EDA. In second place we can see that the Refer-
ence algorithm is able to obtain very similar solutions in size while very dierent
in residue. Both facts clearly encourage the use of population based techniques
that allow to simply manage a set of solutions of diverse characteristics.
Obtaining Biclusters in Microarrays with Population-Based Heuristics 123
Table 1. Statistical Values of residue and size of the biclusters found by every algo-
rithm. Time is in minutes per run.
(a) (b)
(c)
Fig. 1. Histograms of residue (a) , row (b) and columns (c) values of the best biclusters
obtained by every algorithm. Results for k-opt for k 3 were omitted. TS and 2-opt
stands for the memetic algorithm using such local search scheme.
124 P. Palacios, D. Pelta, and A. Blanco
Fig. 2. Volume (row column) vs Residue of the best biclusters found by EDA, GA
and Reference Algorithm
25
EDA
GA
GA + Tabu
20 GA + 2-opt
Average Residue
15
10
0
0 50 100 150 200
Iterations
Finally, Fig. 3 shows the evolution of the average residue over the time for
typical runs of EDA, GA, GA+TS and GA + 2-opt. The faster convergence is
achieved by EDA; given that no restart mechanism is included, EDA becomes
stagnated during the last half of the time available. The curves for GA and
GA+2-opt are pretty similar: both algorithms show a continuous but very slow
convergence. Also, GA+TS is the worst method: it seems like the algorithm
can not made the population to converge. We have two hypothesis for these
behaviors: rst one there may be a problem in the local search parameters;
second one may be related with the fact that a small change in the genotype
can give raise to a big change in the phenotype and, when this fact occurs, the
use of local search is not recommendable. Both situations are under study but
we suspect the second reason may be more relevant.
Obtaining Biclusters in Microarrays with Population-Based Heuristics 125
Acknowledgments
This work was supported in part by projects TIC2003-09331-C02-01 and
TIN2005-08404-C04-01 from the Spanish Ministry of Science and Technology.
References
1. S. Baluja and R. Caruana. Removing the genetics from the standard genetic
algorithm. In A. Prieditis and S. Russel, editors, The Int. Conf. on Machine
Learning, pages 3846. Morgan Kaufmann Publishers, 1995. San Mateo, CA.
2. S. Busygin, G. Jacobsen, and E. Kramer. Double conjugated clustering applied to
leukemia microarray data. SIAM ICDM, Workshop on clustering high dimensional,
2002.
3. Y. Cheng and G. Church. Biclustering of expression data. 8th International Con-
ference on Intelligent System for Molecular Biology, pages 93103, 2001.
4. G. R. Harik, F. G. Lobo, and D. E. Goldberg. The compact genetic algorithm.
IEEE-EC, 3(4):287, November 1999.
5. W. Hart, N. Krasnogor, and J. Smith, editors. Recent Advances in Memetic Algo-
rithms. Studies in Fuzziness and Soft Computing. Physica-Verlag, 2004.
6. J. Hartigan. Clustering Algorithms. John Wiley, 1975.
7. S. Madeira and A. Olivera. Biclustering algorithms for biological data analysis:
A survey. IEEE/ACM Transactions on computational biology an bioinformatics.,
1(1):2445, 2004.
126 P. Palacios, D. Pelta, and A. Blanco
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 127137, 2006.
c Springer-Verlag Berlin Heidelberg 2006
128 A.H.L. Porto and V.C. Barbosa
2 Sux-Set Trees
$ possessing a function similar to the terminator used in sux trees. Like the
sux tree, our new data structure is also a rooted tree; it has edges labeled
by sequences of characters from C and nodes labeled by indices into some of
s1 , . . . , sk to mark sux beginnings. We call it a sux-set tree and it has the
following properties:
The rst character of the label on an edge connecting a node to one of its
children is a dierent character of C for each child.
Each nonempty sux of every one of the k sequences is associated with at
least one leaf of the tree; conversely, each leaf of the tree is associated with
at least one nonempty sux of some sequence (if more that one, then all
suxes associated with the leaf have the same length). Thus, each leaf is
labeled by a set like {(i1 , j1 ), . . . , (iq , jq )} for some q 1, where (ia , ja ),
for 1 a q, 1 ia k, and 1 ja nia , indicates that the sux
sia [ja . . nia ] of sia is associated with the leaf.
Let v be a node of the tree. The label of v is the set {(i1 , j1 ), . . . , (iq , jq )} that
represents the q suxes collectively associated with the leaves of the subtree
rooted at v. If c1 cr is the concatenation of the labels on the path from
the root of the tree to v, excluding if necessary the terminal character $ ,
then c1 cr is a common representation of all prexes of length r of the
suxes associated with the leaves of the subtree rooted at v. If sia [ja . . nia ]
is one of these suxes, then for 1 b r we have sia [ja + b 1] Ccb (that
is, the bth character of the sux is a member of Ccb ).
Note that, when C is a partition of R into singletons, the sux-set tree be-
comes the familiar sux tree of s1 , . . . , sk . In order to see this, it suces to
identify for each character in each sequence the member Ci of C to which it be-
longs, and then substitute i for that character. We show in Figure 1 a sux-set
tree for R = {A, C, G, T }, C1 = {A, G, T }, C2 = {C, T }, s1 = AGCT AG, and
s2 = GGGAT CGA.
In the strategy to be described in Section 3 we do not construct the sux-set
tree to completion, but rather only the portion of the tree that is needed to
represent all sux prexes of length at most M (a xed parameter). Clearly, the
number of nodes in the tree is O(pM ), therefore polynomial in p given the xed
parameter
M . It is relatively simple to see that the tree can be constructed in
O pM+1 (n1 + + nk ) + pM+2 |R|2 time.
where pca is the position in sia of the rightmost character of ta whose column
in A is no greater than c (pca = 0, if none exists), and similarly for pcb . In
(1), Q (pca , pcb , A [a, c], A [b, c]) is the contribution to T (A ) of aligning the two
characters A [a, c] and A [b, c] given pca and pcb . If both A [a, c] and A [b, c] are
gap characters, then we let Q (pca , pcb , A [a, c], A [b, c]) = 0. Otherwise, the value
of Q (pca , pcb , A [a, c], A [b, c]) is determined through a sequence of two steps.
The rst step is the determination of the combined number of optimal global
and local pairwise alignments of sia and sib that go through the pca ,pcb cell of the
dynamic-programming matrix. In what follows, we split this number into three
others, and let each of U1 (pca , pcb ), . . . , U3 (pca , pcb ) be either a linearly normalized
version of the corresponding number within the interval [1, L] for L a parameter,
if that number is strictly positive, or 0, otherwise. We use U1 (pca , pcb ) in reference
to the case of optimal alignments through the pca ,pcb cell that align sia [pca ] to
sib [pcb ], U2 (pca , pcb ) for alignments of sia [pca ] to a gap character, and U3 (pca , pcb ) for
alignments of sib [pcb ] to a gap character.
The second step is the determination of Q (pca , pcb , A [a, c], A [b, c]) itself from
the quantities U1 (pca , pcb ), . . . , U3 (pca , pcb ). If U1 (pca , pcb ) = = U3 (pca , pcb ) = 0 (no
optimal alignments through the pca ,pcb cell), then Q (pca , pcb , A [a, c], A [b, c]) = L.
Otherwise, we have the following cases to consider, where z {1, 2, 3} selects
among U1 , U2 , and U3 depending, as explained above, on A [a, c] and A [b, c]:
If Uz (pca , pcb ) > 0, then Q (pca , pcb , A [a, c], A [b, c]) = Uz (pca , pcb ).
If Uz (pca , pcb ) = 0, then
Q (pca , pcb , A [a, c], A [b, c]) = min+ {U1 (pca , pcb ), U2 (pca , pcb ), U3 (pca , pcb )} ,
where we use min+ to denote the minimum of the strictly positive arguments
only.
What this second step is doing is to favor the alignment of A [a, c] to A [b, c] in
proportion to its popularity in optimal pairwise alignments of sia and sib , and
similarly to penalize itheavily when cell pca , pcb is part of no optimal pairwise
alignment, less so if it is but not aligning A [a, c] to A [b, c].
Finally, the function that yields S(A ) from T (A ) is designed to dierentiate
two alignments of dierent blocks for which T might yield the same value. We do
so by subtracting o T (A ) the fraction of |T (A )| obtained from the average of
two numbers in [0, 1]. The rst number is 1k /k and seeks to privilege (decrease
T by a smaller value) the block with the greater number of subsequences. The
second number is a function of the so-called identity score of an alignment,
that is, the fraction of the number of aligned residue pairs that corresponds to
identical residues. If we denote the identity score of A by I(A ), then the second
number is 1 I(A ) and aims at privileging alignments whose identity scores are
comparatively greater. We then have
k |T (A )|
S(A ) = T (A ) 2 I(A ) . (2)
k 2
The remainder of this section is devoted to describing our heuristic to obtain
a k l alignment A of the sequences s1 , . . . , sk , given the set cover C of the
Multiple Sequence Alignment Based on Set Covers 131
Fig. 1. An example sux-set tree, node labels shown only for the leaves. Each node
label is expressed more compactly than in the text; for example, I2,II4 stands for
the label {(1, 2), (2, 4)}.
the container has a lower score). In this new phase of the heuristic we build a
weighted acyclic directed graph D from B. Manipulating this graph appropriately
eventually yields the desired alignment A of s1 , . . . , sk .
The node set of D is B {s, t}, where s and t are two special nodes. In D,
an edge exists directed from s to each B B, and also from each B B to
t. No other edges are incident to s or t, which are then a source and a sink
in D, respectively (i.e., s only has outgoing edges, t only incoming edges). The
additional edges of D are deployed among the members of B in the following
manner. For B, B B, an edge exists directed from B to B if every subsequence
of B starts to the left of the corresponding subsequence of B in the appropriate
sequence of s1 , . . . , sk . In addition, if B and B overlap, then AB and AB are
also required to be identical in all the overlapping columns. Edges deployed in
this manner lead, clearly, to an acyclic directed graph.
In D, both edges and nodes have weights. Edge weights depend on how the
blocks intersect the sequences s1 , . . . , sk . Specically, if an edge exists from B
to B and the two blocks are nonoverlapping, then its weight is x, where x
is the standard deviation of the intervening sequence-segment lengths. Edges
outgoing from s or incoming to t are weighted in the trivially analogous
manner.
Weights for edges between overlapping blocks and node weights are computed
similarly to each other (except for s and t, whose weights are equal to 0). If x
is the number of residues in node B, then its weight is x/ k. In the case of
an edge between the overlapping B and B , we let x be the number of common
residues and set the edges weight to x/ k. We remark, nally, that this weight-
assignment methodology is very similar to the one in [4], the main dierence
being that we count residues instead of alignment sizes.
Having built D, we are then two further steps away from the nal alignment
A. The rst step is to nd an s-to-t directed path in D whose weighted length
is greatest. Since D is acyclic, this can be achieved eciently. Every block B
appearing on this optimal path immediately contributes AB as part of A, but
there still remain unaligned sequence segments.
The second step and nal action of the heuristic is then to complete the miss-
ing positions of A. We describe what is done between nonoverlapping successive
blocks, but clearly the same has to be applied to the left of the rst block on
the optimal path and to the right of the last block. Let B and B be nonover-
lapping blocks appearing in succession on the optimal path. Let t1 , . . . , tk be
the intervening subsequences of s1 , . . . , sk that are still unaligned. We select the
largest of t1 , . . . , tk and use it to initialize a new alignment along with as many
gap characters as needed for every one of t1 , . . . , tk that is empty. We then visit
each of the remaining subsequences in nonincreasing length order and align it
to the current, partially built new alignment. The method used here is totally
analogous to the one used in Section 3.1 for providing every block with exactly k
subsequences, the only dierence being that a global (as opposed to semi-global)
procedure is used.
134 A.H.L. Porto and V.C. Barbosa
4 Computational Results
! "#
Fig. 2. Average scores for our best choices and the ve competing approaches
when rst published, were run with their best parameter choices (this applies to
T-COFFEE and to MAFFT).
Comparative results are given in Figure 2, where we show average SPS and
CS values inside each of the BAliBASE reference sets we considered. It is clear
from Figure 2 that no absolute best can be identied throughout all the reference
sets. As we examine the reference sets individually, though, we see that at least
one of the two substitution-matrix, set-cover pairs used with our heuristic is
in general competitive with the best contender. Noteworthy situations are the
superior performance of our heuristic on Reference Set 5, and also its weak
performance on Reference Set 3. As for the corresponding running times, our
current implementation performs competitively as well when compared to the
others: on an Intel Pentium 4 processor running at 1.8 GHz with 1 Gbytes of
main memory, ours has taken from 1843.74 to 2010.81 seconds to complete, so
it is slower than the fastest performer (MAFFT, 95.59 seconds) but faster than
the slowest one (T-COFFEE, 4922.50 seconds).
5 Concluding Remarks
Many of our heuristics details can still undergo improvements that go beyond
the mere search for a more ecient implementation. One possibility is clearly
the use of potentially better pairwise alignments, both global and local, when
they are needed as described in Section 3. This possibility is already exploited by
T-COFFEE, which not only employs a position-specic score matrix, but also
uses CLUSTAL W to obtain global pairwise alignments, among other things.
We also see improvement possibilities in the block- and alignment-extension
methods described at the ends of Sections 3.1 and 3.2, respectively. In these two
136 A.H.L. Porto and V.C. Barbosa
Acknowledgments
The authors acknowledge partial support from CNPq, CAPES, and a FAPERJ
BBP grant.
References
1. Manthey, B.: Non-approximability of weighted multiple sequence alignment. The-
oretical Computer Science 296 (2003) 179192
2. Li, T.P., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity
by residue grouping. Protein Engineering 16 (2003) 323330
3. Guseld, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University
Press, Cambridge, UK (1997)
4. Zhao, P., Jiang, T.: A heuristic algorithm for multiple sequence alignment based
on blocks. Journal of Combinatorial Optimization 5 (2001) 95115
5. Bahr, A., Thompson, J.D., Thierry, J.C., Poch, O.: BAliBASE (Benchmark Align-
ment dataBASE): enhancements for repeats, transmembrane sequences and circu-
lar permutations. Nucleic Acids Research 29 (2001) 323326
6. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sen-
sitivity of progressive multiple sequence alignment through sequence weighting,
position-specic gap penalties, and weight matrix choice. Nucleic Acids Research
22 (1994) 46734680
7. Gotoh, O.: Signicant improvement in accuracy of multiple protein sequence align-
ments by iterative renement as assessed by reference to structural alignments.
Journal of Molecular Biology 264 (1996) 823838
8. Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: nding local simi-
larities by multiple sequence alignment. Bioinformatics 14 (1998) 290294
9. Morgenstern, B.: DIALIGN 2: improvement of the segment-to-segment approach
to multiple sequence alignment. Bioinformatics 15 (1999) 211218
10. Notredame, C., Higgins, D.G., Heringa, J.: T-Coee: a novel method for fast and
accurate multiple sequence alignment. Journal of Molecular Biology 302 (2000)
205217
11. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for
rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids
Research 30 (2002) 30593066
12. Ioerger, T.R.: The context-dependence of amino acid properties. In: Proceedings
of the Fifth International Conference on Intelligent Systems for Molecular Biology.
(1997) 157166
Multiple Sequence Alignment Based on Set Covers 137
13. Smith, R.F., Smith, T.F.: Automatic generation of primary sequence patterns from
sets of related protein sequences. Proceedings of the National Academy of Sciences
USA 87 (1990) 118122
14. Heniko, S., Heniko, J.G.: Amino acid substitution matrices from protein blocks.
Proceedings of the National Academy of Sciences USA 89 (1992) 1091510919
15. Dayho, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in
proteins. In Dayho, M.O., ed.: Atlas of Protein Sequence and Structure. Volume 5,
supplement 3. National Biomedical Research Foundation, Washington, DC (1978)
345352
16. Muller, T., Spang, R., Vingron, M.: Estimating amino acid substitution mod-
els: a comparison of Dayhos estimator, the resolvent approach and a maximum
likelihood method. Molecular Biology and Evolution 19 (2002) 813
17. Vogt, G., Etzold, T., Argos, P.: An assessment of amino acid exchange matrices
in aligning protein sequences: the twilight zone revisited. Journal of Molecular
Biology 249 (1995) 816831
18. Green, R.E., Brenner, S.E.: Bootstrapping and normalization for enhanced eval-
uations of pairwise sequence comparison. Proceedings of the IEEE 90 (2002)
18341847
A Methodology for Determining Amino-Acid
Substitution Matrices from Set Covers
1 Introduction
One of the most central problems of computational molecular biology is to align
two sequences of residues, a residue being generically understood as a nucleotide
or an amino acid, depending respectively on whether the sequences under con-
sideration are nucleic acids or proteins. This problem lies at the heart of several
higher-level applications, such as heuristically searching sequence bases or align-
ing a larger number of sequences concomitantly for the identication of special
common substructures (the so-called motifs, cf. [1]) that encode structural or
functional similarities of the sequences or yet the sequences promoter regions in
the case of nucleic acids, for example.
Finding the best alignment between two sequences is based on maximizing
a scoring function that quanties the overall similarity between the sequences.
Normally this similarity function has two main components. The rst one is
a symmetric matrix, known as the substitution matrix for the set of residues
Corresponding author.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 138148, 2006.
c Springer-Verlag Berlin Heidelberg 2006
A Methodology for Determining Amino-Acid Substitution Matrices 139
under consideration, which gives the contribution the function is to incur when
two residues are aligned to each other. The second component represents the cost
of aligning a residue in a sequence to a gap in the other, and gives the negative
contribution to be incurred by the similarity function when this happens. There is
no consensually accepted, general-purpose criterion for selecting a substitution
matrix or a gap-cost function. Common criteria here include those that stem
from structural or physicochemical characteristics of the residues and those that
somehow seek to reproduce well-known alignments as faithfully as possible [2].
We then see that, even though an optimal alignment between two sequences
is algorithmically well understood and amenable to being computed eciently,
the inherent diculty of selecting appropriate scoring parameters suggests that
the problem is still challenging in a number of ways. This is especially true of the
case of protein alignment, owing primarily to the fact that the set of residues is
signicantly larger than in the case of nucleic acids, and also to the existence of
a multitude of criteria whereby amino acids can be structurally or functionally
exchanged by one another.
For a given structural or physicochemical property (or set of properties) of
amino acids, this exchangeability may be expressed by a set cover of the set
of all amino acids, that is, by a collection of subsets of that set that includes
every amino acid in at least one subset. Each of these subsets represents the
possibility of exchanging any of its amino acids by any other. Set covers in this
context have been studied extensively [3] and constitute our departing point in
this paper. As we describe in Section 2, we introduce a new methodology for
discovering both an appropriate substitution matrix and gap-cost parameters
that starts by considering an amino-acid set cover. It then builds a graph from
the set cover and sets up an optimization problem whose solution is the desired
substitution matrix and gap costs.
The resulting optimization problem is dened on a set of target sequence
pairs, preferably one that embodies as great a variety of situations as possible.
The target pairs are assumed to have known alignments, so the optimal solution
to the problem of nding parameters comprises the substitution matrix and the
gap costs whose use in a predened alignment algorithm yields alignments of the
target pairs that in some sense come nearest the known alignments of the same
pairs. Our optimization problem is set up as a problem of combinatorial search,
being therefore highly unstructured and devoid of any facilitating dierentiabil-
ity properties. Reasonable ways to approach its solution are then all heuristic in
nature. In Section 3, we present the results of extensive computational experi-
ments that employ an evolutionary algorithm and targets the BAliBASE pairs
of amino-acid sequences [4].
Notice, in the context of the methodology categorization we mentioned earlier
in passing, that our new methodology is of a dual character: it both relies on
structural and physicochemical similarities among amino acids and depends on
a given set of aligned sequences in order to arrive at a substitution matrix and
gap costs.
We close in Section 4 with conclusions.
140 A.H.L. Porto and V.C. Barbosa
2 The Methodology
where fSh,g (A(1, j), A(2, j)) gives the contribution of aligning A(1, j) to A(2, j)
as either S(A(1, j), A(2, j)), if neither A(1, j) nor A(2, j) is a gap; or (h + g),
if either A(1, j) or A(2, j) is the rst gap in a contiguous group of gaps; or yet
g, if either A(1, j) or A(2, j) is the kth gap in a contiguous group of gaps for
k > 1. An optimal global alignment of X and Y is one that maximizes the sim-
ilarity score of (1) over all possible global alignments of the two sequences. An
optimal local alignment of X and Y , in turn, is the optimal global alignment of
the subsequences of X and Y for which the similarity score is maximum over all
pairs of subsequences of the two sequences. The set of all optimal alignments of
X and Y may be exponentially large in x and y, but it does nonetheless admit
a concise representation as a matrix or directed graph that can be computed ef-
ciently by well-known dynamic programming techniques, regardless of whether
a global alignment of the two sequences is desired or a local one. We refer to this
representation as AX,Y .
Our strategy for the determination of a suitable substitution matrix starts
with a set cover C = {C1 , . . . , Cc } of the residue set R, that is, C is such that
C1 Cc = R. Next we dene G to be an undirected graph of node set R
having an edge between two nodes (residues) u and v if and only if at least one
of C1 , . . . , Cc contains both u and v. Graph G provides a natural association
between how exchangeable a node is by another and the distance between them
in the graph. Intuitively, the closer two nodes are to each other in G the more
exchangeable they are and we expect an alignment of the two to contribute rel-
atively more positively to the overall similarity score. Quantifying this intuition
involves crucial decisions, so we approach the problem in two careful steps, each
1
For k > 0, we assume the customary ane function p(k) = h + gk with h, g > 0 to
express the cost of aligning the kth gap of a contiguous group of gaps in a line of A
to a residue in the other line as p(k) p(k 1), assuming p(0) = 0 [5].
A Methodology for Determining Amino-Acid Substitution Matrices 141
leaving considerable room for exibility. The rst step consists of turning G into
a weighted graph, that is, assigning nonnegative weights to its edges, and then
computing the weighted distance between all pairs of nodes. The second step
addresses the turning of these weighted distances into elements of a substitution
matrix so that larger distances signify ever more restricted exchangeability.
Let us begin with the rst step. For (u, v) an edge of G, let w(u, v) denote its
weight. We dene the value of w(u, v) on the premise that, if the exchangeability
of u and v comes from their concomitant membership in a large set of C, then it
should eventually result in a smaller contribution to the overall similarity score
than if they were members of a smaller set. In other words, the former situation
bespeaks an intuitive weakness of the property that makes the two residues
exchangeable. In broad terms, then, we should let w(u, v) be determined by the
smallest of the sets of C to which both u and v belong, and should also let it be
a nondecreasing function of the size of this smallest set.
Let c be the size of the smallest set of C and c+ the size of its largest set. Let
cu,v be the size of the smallest set of C of which both u and v are members. We
consider two functional forms according to which w(u, v) may depend on c u,v as
a nondecreasing function. Both forms force w(u, v) to be constrained within the
interval [w , w+ ] with w 0. For 1, the rst form is the convex function
c
u,v c
w1 (u, v) = w + (w+ w )
, (2)
c c
+
Having established weights for all the edges of G, let du,v denote the weighted
distance between nodes u and v. Clearly, du,u = 0 and, if no path exists in
G between u and v (i.e., G is not connected and the two nodes belong to two
dierent connected components), then du,v = .
Carrying out the second step, which is obtaining the elements of the substitu-
tion matrix from the weighted distances on G, involves dicult choices as well.
While, intuitively, it is clear that residues separated by larger weighted distances
in G are to be less exchangeable for each other than residues that are closer to
each other (in weighted terms) in G, the functional form that the transforma-
tion of weighted distances into substitution-matrix elements is to take is once
again subject to somewhat arbitrary decisions. What we do is to set S(u, v) = 0 if
du,v = , and to consider two candidate functional forms for the transformation
in the case of nite distances.
Let us initially set [S , S + ] as the interval within which each element of the
substitution matrix S is to be constrained (we assume S > 0 for consistency
with the substitution-matrix element that goes with an innite distance, whose
value we have just set to 0). Let us also denote by d+ the largest (nite) weighted
distance occurring in G for the choice of weights at hand. We then consider two
142 A.H.L. Porto and V.C. Barbosa
Once we decide on one of the two functional forms (2) or (3), and similarly
on one of (4) or (5), and also choose values for w , w+ , , S , S + , and ,
then the substitution matrix S as obtained from C is well-dened and, together
with the gap-cost parameters h and g, can be used to nd the representation
AX,Y of the set of all optimal (global or local) alignments between the two
sequences X and Y . The quality of our choices regarding functional forms and
parameters, and hence the quality of the resulting S, h, and g, can be assessed
if a reference alignment, call it ArX,Y , is available for the two sequences. When
this is the case, we let h,g r r
S (AX,Y , AX,Y ) be the fraction of the columns of AX,Y
that also appear in at least one of the alignments that are represented in AX,Y .
The substitution matrix S, and also h and g, are then taken to be as good for
ArX,Y as h,g r
S (AX,Y , AX,Y ) is close to 1.
Thus, given a residue set cover C and a set Ar of reference alignments (each
alignment on a dierent pair of sequences over the same residue set R), obtaining
the best possible substitution matrix S and gap-cost parameters h and g can be
formulated as the following optimization problem: nd functional forms and pa-
rameters that maximize some (for now unspecied) average of h,g r
S (AX,Y , AX,Y )
r r
over all pairs (X, Y ) of sequences such that AX,Y A . In the next section, we
make this denition precise when residues are amino acids and proceed to the
description of computational results.
3 Computational Results
Let bw be a two-valued variable indicating which of (2) or (3) is to be taken as the
functional form for the edge weights, and similarly let bS indicate which of (4)
or (5) is to give the functional form for the elements of S. These new parameters
dened, we begin by establishing bounds on the domains from which each of the
other eight parameters involved in the optimization problem may take values,
and also make those domains discrete inside such bounds by taking equally
spaced delimiters. For the purposes of our study in this section, this results in
what is shown in Table 1.
The parameter domains shown in Table 1 make up for over 3.7 trillion possible
combinations, yielding about 1.6 billion dierent substitution matrices. The set
of all such combinations seems to be structured in no usable way, so nding the
best combination with respect to some set of reference alignments as discussed in
A Methodology for Determining Amino-Acid Substitution Matrices 143
Section 2 must not depend on any technique of explicit enumeration but rather
on some heuristic approach.
The approach we use in this section is to employ an evolutionary algorithm
for nding the best possible combination within reasonable time bounds. Each
individual for this algorithm is a 10-tuple indicating one of the possible combina-
tion of parameter values. Our evolutionary algorithm is a standard generational
genetic algorithm. It produces a sequence of 100-individual generations, the rst
of which is obtained by randomly choosing a value for each of the 10 parameters
in order to produce each of its individuals. Each of the subsequent generations is
obtained from the current generation by a combination of crossover and muta-
tion operations, following an initial elitist step whereby the 5 ttest individuals
of the current generation are copied to the new one.
While the new generation is not full, either a pair of individuals is selected
from the current generation to undergo crossover (with probability 0.5) or one
individual is selected to undergo a single-locus mutation (with probability 0.5).2
The pair of individuals resulting from the crossover, or the single mutated indi-
vidual, is added to the new generation, unless an individual that is being added
is identical to an individual that already exists in the population. When this
happens, the duplicating individual is substituted for by a randomly generated
individual. Selection is performed in proportion to the individuals linearly nor-
malized tnesses.3
The crux of this genetic algorithm is of course how to assess an individuals
tness, and this is where an extant set of reference alignments Ar comes in. In our
2
Both the crossover point and the locus for mutation are chosen at random, essentially
with the parameters domains in mind, so that the probability that such a choice
singles out a parameter whose domain has size a is proportional to log a. Mutating
the parameters value is achieved straightforwardly, while breaking the 10-tuples for
crossover requires the further step of interpreting the parameter as a binary number.
3
This means that, for 1 k 100, the kth ttest individual in the generation is
selected with probability proportional to L (L 1)(k 1)/99, where L is chosen
so that the expression yields a value L times larger for the ttest individual than it
does for the least t (for which it yields value 1). We use L = 10 throughout.
144 A.H.L. Porto and V.C. Barbosa
h,g r r
S,1 (AX,Y , AX,Y ) is based on all the columns of AX,Y ;
h,g r r
S,2 (AX,Y , AX,Y ) is based on all the columns of AX,Y that contain no gaps;
h,g r r
S,3 (AX,Y , AX,Y ) is based on all the columns of AX,Y that lie within motifs;
h,g
S,4 (AX,Y , AX,Y ) is based on all the columns of ArX,Y that lie within motifs
r
These dened, we rst average each one of them over Ar before combining
them into a tness function. The average that we take is computed in the in-
directly weighted style of [6], which aims at preventing any family with overly
many pairs, or any pair on which S, h, and g are particularly eective, from
inuencing the average too strongly.
The weighting takes place on an array having 10 lines, one for each of the
nonoverlapping 0.1-wide intervals within [0, 1], and one column for each of the
BAliBASE families. Initially each pair (X, Y ) having a reference alignment ArX,Y
in Ar is associated with the array cell whose column corresponds to its family
and whose line is given by the interval within which the identity score of the
reference alignment ArX,Y falls. This score is the ratio of the number of columns
of ArX,Y whose two amino acids are identical to the number of columns that have
no gaps (when averaging h,g r h,g r
S,3 (AX,Y , AX,Y ) or S,4 (AX,Y , AX,Y ), only columns
that lie within motifs are taken into account).
For 1 k 4, we then let h,g r h,g r
S,k (A ) be the average of S,k (AX,Y , AX,Y )
over Ar obtained as follows. First take the average of h,g r
S,k (AX,Y , AX,Y ) for each
A Methodology for Determining Amino-Acid Substitution Matrices 145
array cell over the sequence pairs (X, Y ) that are associated with it (cells with no
pairs are ignored). Then h,g r
S,k (A ) is computed by rst averaging those averages
that correspond to the same line of the array and nally averaging the resulting
numbers (note that lines whose cells were all ignored for having no sequence
pairs associated with them do not participate in this nal average).
We are then in position to state the denition of our tness function. We
denote it by h,g r
S (A ) to emphasize its dependency on how well S, h, and g
lead to alignments that are in good accord with the alignments of Ar . It is
given by the standard Euclidean norm of the four-dimensional vector whose kth
component is h,g r
S,k (A ), that is,
2 2
h,g r
S (A ) = h,g r
S,1 (A ) + + h,g r
S,4 (A ) . (6)
Clearly, 0 h,g r
S (A ) 2 always.
We found through initial experiments that carrying over with our algorithm
for each single generation requires roughly 13 to 14 hours on an Intel Pentium 4
processor running at 2.26 GHz and equipped with 512 Mbytes of main memory.
Practically all of this eort is related to computing h,g r
S (A ) for each individual
in the current population, and because this is done in a manner that is fully
independent from any other individual, we can speed the overall computation
up nearly optimally by simply bringing more processors into the eort.
The results we describe next were obtained on four processors running in
parallel and for the following simplications. We concentrated solely on evolving
individuals under global alignments for the set cover of [7], and considered, in
addition, only the subset of Ar , denoted by Ar,1 , comprising sequence pairs that
are relative to the BAliBASE reference set 1. In this case, the tness function to
be maximized is h,g r,1
S (A ), dened as in (6) when A
r,1
substitutes for Ar . Given
these simplications, computing through each generation has taken roughly 20
minutes.
The substitution matrices we have used for the sake of comparison are BC0030
[8], BENNER74 [9], BLOSUM62 [10], FENG [11], GONNET [12], MCLACH [13], NWSGAPPEP
[14], PAM250 [15], RAO [16], RUSSELL-RH [17], and VTML160 [18]. The gap-cost
parameters h and g we used with them are the ones from [8] for BC0030 and
RUSSELL-RH, from [19] for VTML160, and from [6] for all others.
Our results are summarized in the four plots of Figure 1, each giving the
evolution of one of h,g r,1 h,g r,1
S,1 (A ), . . . , S,4 (A ) for the ttest individuals along the
generations. We show this evolution against dashed lines that delimit the inter-
vals within which the corresponding gures for the matrices used for comparison
are located. Clearly, the genetic algorithm very quickly produces a substitution
matrix, with associated gap costs, that surpasses this interval as far as the tness
components h,g r,1 h,g r,1
S,3 (A ) and S,4 (A ) are concerned, even though it lags behind
in terms of h,g r,1 h,g r,1
S,1 (A ) and S,2 (A ). This substitution matrix, it turns out, is
then superior to all those other matrices when it comes to stressing alignment
columns that lie within motifs.
146 A.H.L. Porto and V.C. Barbosa
0.84 0.84
0.82 0.82
Fitness component
0.8 0.8
0.78 0.78
0.76 0.76
0.74 0.74
0.72 0.72
0.7 0.7
0.68 0.68
0.66 (a) 0.66 (b)
0.84 0.84
0.82 0.82
Fitness component
0.8 0.8
0.78 0.78
0.76 0.76
0.74 0.74
0.72 0.72
0.7 0.7
0.68 0.68
0.66 (c) 0.66 (d)
4 Concluding Remarks
We have introduced a new methodology for the determination of amino-acid
substitution matrices. The new methodology starts with a set cover of the residue
alphabet under consideration and builds an undirected graph in which node
vicinity is taken to represent residue exchangeability. The desired substitution
matrix arises as a function of weighted distances in this graph. Determining
the edge weights, and also how to convert the resulting weighted distances into
substitution-matrix elements, constitute the outcome of an optimization process
that runs on a set of reference sequence alignments and also outputs gap costs for
use with the substitution matrix. Our methodology is then of a hybrid nature:
it relies both on the structural and physicochemical properties that underlie the
set cover in use and on an extant set of reference sequence alignments.
The optimization problem to be solved is well-dened: given parameterized
functional forms for turning cover sets into edge weights and weighted distances
into substitution-matrix elements, the problem asks for parameter values and
gap costs that maximize a certain objective function on the reference set of
alignments. We have reported on computational experiments that use a genetic
algorithm as optimization method and the BAliBASE suite as the source of the
required reference alignments.
A Methodology for Determining Amino-Acid Substitution Matrices 147
Our results so far indicate that the new methodology is capable of producing
substitution matrices whose performance falls within the same range of a number
of known matrices even before any optimization is actually performed (i.e., based
on the random parameter instantiation that precedes the genetic algorithm); this
alone, we believe, singles out our methodology as a principled way of determining
substitution matrices that concentrates all the eort related to the structure and
physicochemical properties of amino acids on the discovery of an appropriate set
cover. They also indicate, in a restricted setting, that the methodology can yield
substitution matrices that surpass all the others against which they were tested.
We have also found that strengthening this latter conclusion so that it holds
in a wider variety of scenarios depends on how eciently we can run the genetic
algorithm. Fortunately, it appears that it is all a matter of how many processors
can be amassed for the eort, since the genetic procedure is inherently amenable
to parallel processing and highly scalable, too. There is, of course, also the issue
of investigating alternative functional forms and parameter ranges to set up
the optimization problem, and in fact the issue of considering other objective
functions as well. Together with the search for faster optimization, these issues
make for a very rich array of possibilities for further study.
Acknowledgments
The authors acknowledge partial support from CNPq, CAPES, and a FAPERJ
BBP grant.
References
1. Sagot, M.F., Wakabayashi, Y.: Pattern inference under many guises. In Reed, B.A.,
Sales, C.L., eds.: Recent Advances in Algorithms and Combinatorics. Springer-
Verlag, New York, NY (2003) 245287
2. Valdar, W.S.J.: Scoring residue conservation. Proteins: Structure, Function, and
Genetics 48 (2002) 227241
3. Li, T.P., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity
by residue grouping. Protein Engineering 16 (2003) 323330
4. Bahr, A., Thompson, J.D., Thierry, J.C., Poch, O.: BAliBASE (Benchmark Align-
ment dataBASE): enhancements for repeats, transmembrane sequences and circu-
lar permutations. Nucleic Acids Research 29 (2001) 323326
5. Setubal, J., Meidanis, J.: Introduction to Computational Molecular Biology. PWS
Publishing Company, Boston, MA (1997)
6. Vogt, G., Etzold, T., Argos, P.: An assessment of amino acid exchange matrices
in aligning protein sequences: the twilight zone revisited. Journal of Molecular
Biology 249 (1995) 816831
7. Smith, R.F., Smith, T.F.: Automatic generation of primary sequence patterns from
sets of related protein sequences. Proceedings of the National Academy of Sciences
USA 87 (1990) 118122
8. Blake, J.D., Cohen, F.E.: Pairwise sequence alignment below the twilight zone.
Journal of Molecular Biology 307 (2001) 721735
148 A.H.L. Porto and V.C. Barbosa
9. Benner, S.A., Cohen, M.A., Gonnet, G.H.: Amino acid substitution during func-
tionally constrained divergent evolution of protein sequences. Protein Engineering
7 (1994) 13231332
10. Heniko, S., Heniko, J.G.: Amino acid substitution matrices from protein blocks.
Proceedings of the National Academy of Sciences USA 89 (1992) 1091510919
11. Feng, D.F., Johnson, M.S., Doolittle, R.F.: Aligning amino acid sequences: com-
parison of commonly used methods. Journal of Molecular Evolution 21 (1985)
112125
12. Gonnet, G.H., Cohen, M.A., Benner, S.A.: Exhaustive matching of the entire
protein sequence database. Science 256 (1992) 14431445
13. McLachlan, A.D.: Tests for comparing related amino-acid sequences. Cytochrome
c and cytochrome c551. Journal of Molecular Biology 61 (1971) 409424
14. Gribskov, M., Burgess, R.R.: Sigma factors from E. coli, B. subtilis, phage SP01,
and phage T4 are homologous proteins. Nucleic Acids Research 14 (1986) 6745
6763
15. Dayho, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in
proteins. In Dayho, M.O., ed.: Atlas of Protein Sequence and Structure. Volume 5,
supplement 3. National Biomedical Research Foundation, Washington, DC (1978)
345352
16. Rao, J.K.M.: New scoring matrix for amino acid residue exchanges based on residue
characteristic physical parameters. International Journal of Peptide and Protein
Research 29 (1987) 276281
17. Russell, R.B., Saqi, M.A.S., Sayle, R.A., Bates, P.A., Sternberg, M.J.E.: Recogni-
tion of analogous and homologous protein folds: analysis of sequence and structure
conservation. Journal of Molecular Biology 269 (1997) 423439
18. Muller, T., Spang, R., Vingron, M.: Estimating amino acid substitution mod-
els: a comparison of Dayhos estimator, the resolvent approach and a maximum
likelihood method. Molecular Biology and Evolution 19 (2002) 813
19. Green, R.E., Brenner, S.E.: Bootstrapping and normalization for enhanced eval-
uations of pairwise sequence comparison. Proceedings of the IEEE 90 (2002)
18341847
Multi-Objective Evolutionary Algorithm for Discovering
Peptide Binding Motifs
1 Introduction
Multi-Objective Evolutionary Algorithms (MOEA) have been effectively used in
various domains to solve real-world complex search problems. Multi-objective prob-
lems need simultaneous optimization of number of competing objectives and result in
a set of solutions called Pareto optimal set. These can be solved as a single objective
optimization problem by combining all objectives into a single objective. Solving
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 149 158, 2006.
Springer-Verlag Berlin Heidelberg 2006
150 M. Rajapakse, B. Schmidt, and V. Brusic
Ag7 Datasets
2.1 I-A
Ten different I-A Ag7 datasets were extracted from several independent studies [9-18].
Each experimental dataset provides peptides which are classified as binders or non-
binders with a label which have been assigned according to their experimental bind-
ing affinities. The dataset available can be expressed as D = {D i: i=1,2,.d } where d
denotes the number of datasets. Let i th data set, Di = {(xij, vij): j = 1, 2,., ni } where
xij is the jth sequence in the i th dataset and the label vij {b, nb} indicates whether the
sequence xij is a binder (b), or a non-binder (nb).
Seven different reported k-mer motifs (MHC binging motifs are of k=9 amino acid
length) [6-12] have been derived for predicting peptide binders to I-A Ag7 molecule. The
residues, which contribute significantly to peptide binding, are called primary anchor
residues and the positions they occur are called anchor positions. Anchor positions
may be occupied by so called preferred residues which are tolerated, but alone con-
tribute little to peptide binding strength. In [6-12] each motif describes amino acids at
primary and secondary anchor positions, as well as forbidden amino acids at spe-
cific positions. We interpret these as anchor, tolerated, and non-tolerated amino acids.
These experimental motifs can be expressed as R = {Ri: i=1,2,.r } where r denotes
the number of experimentally found motifs. The Table 1 below illustrates an example
of an experimentally derived Reizis peptide binding motif to I-A Ag7.
In this section we give a formal definition of the target model as a quantitative matrix.
A k-mer motif in an amino acid sequence is usually characterized by a binding score
matrix Q = {qia }kx20 where qia denotes the binding score of the site i of the motif
when it occupies by the amino-acid a where denotes the set of 20 amino-
acid residues. A binding score computed by adding the scores assigned for each amino
152 M. Rajapakse, B. Schmidt, and V. Brusic
acid in the respective positions of a k-mer motif not only indicates the likelihood of
the presence of a particular motif but also determines the likelihood that a sequence
containing the motif that binds to another sequence. Therefore, a binding score matrix
can be viewed as a quantification of a real biological functioning or binding of the
motif to other peptides. Given a binding score matrix Q of size k20 we define the
binding score, s for a kmer motif, m*, starting at the position j* in a sequence
x=(x1 ,x2 L xn ) of length n as:
s= max sj (1)
j{1,..., n k +1}
(2)
where s j =
i = 0,..., k 1
qixi+ j
We denote s(xj* : m*) to indicate the binding score of the motif m* present at j* posi-
tion of the peptide sequence x.
NSGA-II incorporates several mechanisms that facilitate faster and better conver-
gence of a solution population for multi-objective problems. These mechanisms in-
clude non-dominated sorting, elitism, diversity preservation, and constrained han-
dling. A solution is said to be dominant if it is as good as or better than the other solu-
tions with respect to all objectives. Non-dominated sorting refers to sorting of indi-
viduals that are not dominated by any other individual in the population with respect
to every objective. All the non-dominated individuals are assigned the same fitness
value. The same procedure is then carried out on the remaining population until a new
set of non-dominated solutions is found. The solutions found in the subsequent rounds
are assigned a fitness value lower than in the previous round and the process contin-
ues on until the whole population is partitioned into non-dominated fronts with di-
verse fitness values. The elitism prevents loosing fit individuals encountered in ear-
lier generations by allowing parents to compete with offspring. In NSGA-II, the di-
versity of Pareto-optimal solutions is maintained by imposing a measure known as
crowding distance measure. More details on these mechanisms can be found in
[1][24].
Objectives: The objectives are defined to realize a motif that can best represent the
Ag7 binding motif. The dataset of each experiment in the litera-
characteristics of the I-A
ture gives the information whether the particular sequence is a binder or non-binder.
Using this information, the numbers of true positives (TP) and true negatives (TN)
determined by solutions in the population is computed. By incorporating the TPs and
TNs resulting from the evaluation and by taking into the account the cumulative dis-
tance between a putative motif, m, representing a binding score matrix Q and the
score matrix representation of the experimental motifs set, Q(R), we can define two
objective functions f1 and f2 as follows:
r
min f 2 = Q Q(R )
i =1
i
(5)
where FP and FN denotes false positives and false negatives, respectively. And
Q(Ri ) represents the score matrix representation of an experimental motif, Ri.
c1 = 1 .0 - (( 1 * FP ) / N B ) 0 (6)
c 2 = 1 .0 - (( 2 * FN ) / B ) 0 (7)
The 1 and 1 are the two control parameters preventing all binders and non-binders
being recognized as binders and vice versa. NB and B correspond to the number of
true non-binders and binders in the training dataset.
for the performance analysis. This percentage was estimated based on the analysis of
Ag7 binding data given in [20].
I-A
The following scoring scheme is used to represent each experimental motif in [8-12]
as a quantitative matrix. Two experimental motifs [6,7] were excluded as they do not
describe many of the k-mer positions, or have assigned fewer residues for the de-
scribed positions. For all the other motifs, all non tolerated positions were given a
score of 0. Well tolerated residues were assigned a maximum score of 127. The pre-
ferred residues at the primary anchor positions were assigned half the maximum
score. And the positions that do not carry any predefined residues were assigned one
third of the maximum score. The distance to each of these score matrices is calculated
and the sum of all the distances is used to optimize the objective function, f2.
For the MOEA optimization runs, we used a population of 500, where each individual
representing 180 real numbers bound by the limits 0 and 127. A crossover probability
for the real variables is set to 1.0, and the mutation probability of 0.006, distribution
index for crossover and mutation are set to 15 and 30, respectively. After 300 genera-
tions, the evolutionary process was terminated. The convergence of the algorithm to
the final population is illustrated in Figure 1. Having a number of Pareto solutions
allows the user to choose a best solution. Of the motif solution set, we chose the motif
that gives the highest AROC for the test dataset.
168
"fitness"
166
164
Fitness2
162
160
158
156
86 88 90 92 94 96 98 100 102 104
Fitness1
Fig. 1. Final population after 300 iterations, with the mutation probability=0.006 and cross-over
probability=1.0. Fitness1 depicts the sum of FP and FN with respect to the best individuals in
the population whereas Fitness2 indicates the cumulative sum of the distances to the experi-
mental motifs and the best individuals (score matrices) in the population.
Multi-Objective Evolutionary Algorithm for Discovering Peptide Binding Motifs 155
Fig. 2. a) AROC plots between theoretically derived motifs (a). b) Performance comparison
between five experimentally derived motifs and MOEA-motif.
156 M. Rajapakse, B. Schmidt, and V. Brusic
Table 2. The AROC values from predictions using each motif on the independent test dataset.
AROC>0.9 correspond to excellent, 0.8<AROC<0.9 to good, 0.7<AROC<0.8 to marginal prediction
accuracy. AROC=0.5 corresponds to random guessing, and 0.5<AROC<0.7 to poor predictions.
Motif AROC
Reizis 0.81
Harrison 0.81
Gregori 0.78
Latek 0.79
Rammensee 0.78
Reich 0.74
Amor 0.78
MEME 0.72
Gibbs 0.81
Rankpep 0.75
MOEA 0.87
Table 3. The AROC values estimated on the individual datasets used in deriving the experimen-
tal motifs by the best performing motifs in Table 2
Motif
Dataset Reizis Harrison Gibbs MOEA
Reizis 0.95 0.75 0.33 0.77
Harrison 0.68 0.88 0.79 0.76
Gregori 0.74 0.69 0.77 0.84
Latek 0.95 0.64 0.81 0.89
Corper 0.50 0.53 0.39 0.70
MHCPEP 0.59 0.72 0.64 0.85
Yu 0.48 0.33 0.58 0.61
The performance of the Multi-objective motif on the individual datasets used in the
derivation of the experimental motifs show similar or better results (greater than 0.7
AROC value) compared to the performance of other motifs across datasets (Table 3).
The performance of the motifs on their respective datasets indicated their bias towards
their own datasets. Overall MOEA-derived matrix performed well across all datasets
except in the case of Yu dataset, for which it gave the AROC=0.61. Other experimental
motifs on the same dataset performed poorly with AROC<0.61 except for the Latek
motif (AROC>0.7). The MOEA-derived matrix reconciles significant variances in the
experimental motifs and minimizes the number of false predictions in the test dataset
as compared to the performance of previously determined experimental motif.
4 Conclusions
We have proposed MOEA for deriving a consensus motif from a number of experi-
mentally derived motifs. One of the objectives of our approach is to optimize the
Multi-Objective Evolutionary Algorithm for Discovering Peptide Binding Motifs 157
number of true predictions across all I-Ag7 datasets, which was not the case with the
experimentally derived motifs. The experimental motifs formulated from each inde-
pendent study show biases to their own datasets - they perform well on the respective
dataset, but poorly on the other datasets. The other objective is to capture the signifi-
cance of each motif and combine them together so that the resulting motif can act as a
consensus motif characterizing all the experimental motifs. As we can see from the
results, the derived motif performed comparatively well on all the datasets. Further-
more, the MOEA evolved solution outperformed all the other computationally derived
motifs demonstrating its suitability for discovery of highly accurate motifs.
Acknowledgements
This work was funded in part (MR and VB) by the NIAID, NIH, Dept. of Health and
Human Services, Grant 5 U19 AI56541 and Contract HHSN266200400085C.
References
1. Deb, K. et al.: A Fast and Elitist Multiobjective Genetic Algorithm. IEEE Trans. on Evolu-
tionary Computation 6 (2002) 182-197
2. Zitzler, E. et al.: Multiobjective Evolutionary Algorithms: A Comparative Case Study and
the Strength of Pareto Approach. IEEE Trans. on Evolutionary Computation 3 (1999) 257-
271
3. Knowles, J. D. et al.: Approximating the Nondominant front using the Pareto Archived
evolution stratergy. Evolutionary Computation 8 (2000) 49-172
4. Fonseca, C. M. et al.: Genetic Algorithms for Multiobjective Optimization: Formulation,
discussion and generalization. In Proc. of the fifth Intl. conference on Genetic Algorithms,
S. Forrest, Ed. San Mateo, CA: Morgan Kauffman (1993) 416-423
5. Lee, S. et al.: Comparison of Multi-Objective Genetic Algorithms in Optimizing Q-Law-
Thrust Orbit Transfers. GECCO (2005)
6. Amor,S. et al.: Encephalitogenic epitopes of myelin basic protein, proteolipid proteing,
and myelin oligodendrocyte glycoprotein for experimental allergic en-cephalomyelitis in-
duction in Biozzi AB/H(H-2Ag7) mice share an amino acid motif. J. Immunology 156
(1996) 3000-3008
7. Reich,E.P. et al.: Self peptides isolated from MHC glycoproteins of non-obese diabetic
mice. J. Immunology 152 (1994) 2279-2288
8. Rammensee, H. et al.: SYFPEITHI:database for MHC ligands and peptide motifs. Immu-
nogenetics 50 (1999) 213-219
9. Reizis,B., et al.: Molecular characterization of the diabetes mouse MHC class-II protein, I-
Ag7. Int. Immunology 9 (1997) 43-51
10. Harrison,L.C. et al.: A peptide binding motif for I- Ag7, the class II major jistocompatibil-
ity complex (MHC) molecule of NOD and Biozzi AB/H mice. J. Exp. Med. 185 (1997)
1013-1021
11. Latek,R.R. et al.: Structural basis of peptide binding and presentation by the type I diabe-
tes-associated MHC class II molecule of NOD mice. Immunity 12 (2000) 699-710
12. Gregori,S. et al.: The motif for peptide binding to the insulin-dependent diabetes mellitus-
associated class II MHC molecule I-Ag7 validated by phage display library. Int. Immunol-
ogy 12(4) (2000) 493-503
158 M. Rajapakse, B. Schmidt, and V. Brusic
13. Corper,A.L. et al.: A structural framework for deciphering the link between I-Ag7 and
autoimmune diabetes. Science 288 (2000) 505-511
14. Yu,B. et al.: Binding of conserved islet peptides by human and murine MHC class II
molecules associated with susceptibility to type I diabetes. J. Immunology 30(9) 2497-506
15. Suri, A. et al.: In APCs, the Autologous Peptides Selected by the Diabetogenic I-Ag7
Molecule Are Unique and Determined by the Amino Acid Changes in the P9 Pocket. J
Immunol 168(3) (2002) 1235-43
16. Stratman,T. et al.: The I-Ag7 MHC class II molecule linked to murine diabetes in a pro-
miscuous peptide binder. J. Immunology 165 (2000) 3214-3225
17. Brusic,V. An unpublished dataset
18. Brusic,V., Rudy,G., Harrison,L.C. MHCPEP, a database of MHC-binding peptides: update
1997. Nucleic Acids Res. 26 (1998) 368-371
19. Bailey,T.L., Elkan,C. The value of prior knowledge in discovering motifs with MEME.
Proc Int Conf Intell Syst Mol Biol. 3 (1995) 21-29
20. Peer I. et al. Proteomic Signatures:Amino Acid and Oligopeptide Compositions Differen-
tiate Among Phyla. Proteins 54 (2004) 20-40
21. http://meme.scdc.edu/meme/website/meme.html
22. Neuwald, A. F. et al.: Gibbs motif sampling: detection of bacterial outer membrane protein
repeats. Protein Science 4 (1995) 1618-32
23. Reche, P, A. et al.: Enhancement to the RANKPEP resource for the prediction of peptide
binding to MHC molecules using profiles. Immunogenetics 56 (2004) 405-419
24. Coello, C. A., An Updated Survey of Evolutionary Multiobjective Optimization Tech-
niques: State of the Art and Future Trends. Congress on Evolutionary Computation, Wash-
ington: IEEE Service Center (1999) 3-13
25. Carraso-Marin, E., Kanagawa, O., Unanue, E. R., The lack of consensus for I-Ag7-peptide
binding motifs: Is there a requirement for anchor amino acid side chain. Proc. Natl. Acad.
Sci. 96 (1999) 8621-8626
26. Rajapakse, M., et al.: Deriving Matrix of Peptide Interactions in Diabetic Mouse by Ge-
netic Algorithm. Proc. of 6th International Conference on Intelligent Data Engineering and
Automated Learning, LNCS, Springer, 3578 (2005) 440-447
Mining Structural Databases: An Evolutionary
Multi-Objetive Conceptual Clustering
Methodology
1 Introduction
The increased availability of biological databases containing representations of
complex objects such as microarray time series, regulatory networks or metabolic
pathways permits access to vast amounts of data where these objects may be
found, observed, or developed [1, 2, 3]. In spite of the recent renewed interest
in knowledge-discovery techniques (or data mining), there is a dearth of data
analysis methods intended to facilitate understanding of the represented objects
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 159171, 2006.
c Springer-Verlag Berlin Heidelberg 2006
160 R. Romero-Zaliz et al.
and related systems by their most representative features and those relationship
derived from these features (i.e., structural data).
Structural data can be viewed as a graph containing nodes representing ob-
jects, which have features linked to other nodes by edges corresponding to their
relationships. Interesting objects in structural data are represented as substruc-
tures, which consists of subgraph partitions of the datasets [4]. Conceptual clus-
tering techniques have been successfully applied to structural data to uncover
objects or concepts that relates objects, by searching through a predened space
of potential hypothesis (i.e., subgraphs that represent associations of features)
for the hypothesis that best ts the training examples [5]. However, the for-
mulation of the search problem in a graph-based structure would result in the
generation of many substructures with small extent as it is easier to explain or
model match smaller data subsets than those that constitute a signicant portion
of the dataset. For this reason, any successful methodology should also consider
additional criteria to extract better dened concepts based on the size of the
substructure being explained, the number of retrieved substructures, and their
diversity [4, 6]. The former are conicting criteria that can be approached as
an optimization problem. Multi-objective optimization techniques can evaluate
concepts or substructures based on the conicting criteria, and thus, to retrieve
meaningful substructures from structural databases.
In this paper we propose a conceptual clustering methodology termed EMO-
CC for Evolutionary Multi-Objective Conceptual Clustering that uses multi-
objective and multi-modal optimization techniques. The EMO-CC methodology
uses an ecient search process based on Evolutionary Algorithms [7, 8, 9], which
inspects large data spaces that otherwise would be intractable. Besides, EMO-CC
provides annotations of the uncovered substructures, and based on them, applies
an unsupervised classication approach to retrieve new members of previously
discovered substructures. We apply EMO-CC to the Gene Ontology database
(i.e., the GO Project [3]) to recover interesting substructures containing genes
sharing a common set of terms, which are dened at dierent levels of specicity
and correspond to dierent ontologies, producing novel annotations based on
them. Particularly, we use these substructures to explain inmuno-inammatory
responses measured in terms of gene expression proles derived from the analy-
sis of longitudinal blood expression proles of human volunteers treated with
intravenous endotoxin compared to placebo [10].
This work is organized as follows. Section 2 reviews the conceptual clustering
problem. Section 3 describes the EMO-CC methodology. Section 4 shows the
customization and results of applying EMO-CC to the GO database to explain
gene expression proles from the inammatory problem. Section 5 introduces
the discussion.
2 Conceptual Clustering
isting methods for clustering are designed for linear feature-value data. However,
sometimes we need to represent structural data that do not only contains de-
scriptions of individual observations in databases, but also relationships among
these observations. Therefore, mining into structural databases entails address-
ing both the uncertainty of which observations should be placed together, and
also which distinct relationships among features best characterize dierent sets
of observations, having in mind that, a priori, we do not know which feature is
meaningful for a given relationship.
Conceptual clustering, in contrast to most typical clustering techniques [12],
have been successfully applied to structural databases to uncover concepts that
are embedded in subsets of structural data or substructures [4]. While most
machine learning techniques applied directly or indirectly to structural data-
bases exhibit methodological dierences, they do share the same framework
even though they employ distinct metrics, heuristics or probability interpre-
tations [13, 4]: (1) Database representation. Structural data can be viewed as a
graph containing nodes representing objects, which have features linked to other
nodes by edges corresponding to their relations. A substructure consists of a
subgraph of structural data [4]; (2) Structure Learning. This process consists of
searching through the space for potential substructures, and either returning the
best one found or an optimal sample of them; (3)Cluster evaluation. The sub-
structure quality is measured by optimizing several criteria, including specicity,
where harboring more features always increases the inferential power; sensitivity,
where a large coverage of the dataset produces good generality; and diversity,
where minimally overlapping between clusters generates more distinct clusters
and descriptions from dierent angles; (4) Database compression. The database
compression provides simpler representations of the objects in a database; and
(5) Inference. New observations can be predicted from previously learned sub-
structures by using classiers that optimize their matching based on distance
[14] or probabilistic metrics [5]).
Massive microarray experiments provide a wide view of the gene regulation prob-
lem; however, most of the biological knowledge extracted from these experiments
include few relevant genes, some of which are dicult to be identied because
of their low expression levels. Moreover, it is also dicult to distinguish among
expressed genes that behave dierentially between treatments, time, patients
and other factors that are always hidden in typical microarray protocols (e.g.,
gender or age). Here we focus on the challenge of explaining these proles and
re-discover them based on independent biological information.
We therefore apply EMO-CC to discover interesting substructures in the
Gene Ontology database that can explain classes composed of microarray gene
proles having similar behaviors of their expression over time, treatment, and
patient. The Gene Ontology (GO) network stores one of the most powerful char-
acterization of genes, containing three structured vocabularies (i.e., ontologies)
that describe gene products in terms of their associated biological processes, cel-
lular components and molecular functions in a species-independent manner [3].
The GO terms are organized as hierarchical networks, where each level corre-
sponds to a dierent specicity denition of such terms (i.e., higher level terms
are more general than lower level terms). Particularly, from the computational
point of view, these networks are organized as structures called directed acyclic
graphs (DAGs), which are one way routed graphs that can be represented as
trees. Therefore, identifying which distinct relationships among features best
characterize dierent sets of observations does not only have to consider the
process of grouping distinct type of features, but also dening at which level of
specicity they have to be represented.
We used the GO database and compatibilized the terms with descriptions pro-
vided by Aymetrix, where each observation of the database has the following
features: (1) Name: Aymetrix identier for each gene in HG-U133A v2.0 set
of arrays; (2) Biological process: List of the biological processes where a gene
product is involved (e.g., mitosis or purine metabolism); (3) Molecular function:
List of the biological functions of the gene product (e.g., carbohydrate binding
and ATPase activity), which is indexed by a list of integer GO codes; and (3)
Cellular component: List of the cellular components indicating location of gene
164 R. Romero-Zaliz et al.
207929 at
GO:0003673 (0)
202442 at
213475 s at
GO:0008150 (1) GO:0005575 (1) 214467 at
Specicity = 0.6769
219889 at
Sensitivity = 0.0051
203508 at
GO:0005623 (2)
201087 at
209864 at
GO:0008152 (3) 200625 s at
products (e.g., nucleus, telomere, and origin recognition complex), which are
indexed by a list of integer GO codes.
An instance for the GO domain is redened as the particular subset of values
that constitutes a prex tree1 of a database observation in contrast to a subtree
as in the general case. Then, an instance occurs in a substructure if a subgraph
of the prex tree that represent that instance matches with the substructure
tree, where this tree contains tagged nodes with the type of feature (e.g., bio-
logical process), and the corresponding values (e.g., GO:0007165), and the edges
represent relationship between features (i.e., tagged nodes).
Good substructures are those ones that result in a trade-o between sensi-
tivity and specicity. Although, the sensitivity can be calculated based on the
number of instances in a substructure, the specicity of the substructure is not
linearly dependent to its size, as it was previously dened based on the number
of nodes and edges because of the level component included in the GO domain.
Thus, we redene the specicity as the distance among all most specic nodes
of an instance i and the closest leaf-node in the substructure S:
K U dist(nodeu ,nodei )
i u level(nodei )
Specif icity(S) = (1)
K
where the distance is calculated as the number of edges between two nodes, the
level of a node is calculated as the length of the shortest path to the root node, U
is the number of leaf-nodes in substructure S, and K is the number of instances
occurring in substructure S. An example of a chormosome representing a cluster
concept is shown in Figure 1.
The structural database used for the GO domain is composed of 1770 instances
of genes and their GO associated terms. The population of the evolutionary
1
Tree T is a prex tree of T if T can be obtained from T by appending zero or more
subtrees to some of the nodes in T . Notice that any tree T is a prex of itself.
Mining Structural Databases: An EMO-CC Methodology 165
Parameter Value
Population Size 200
Number of Objective Evaluations 20000
Crossover probability 0.6
Mutation probability 0.2
We compare EMO-CC with two other methods, APRIORI and SUBDUE, all of
which satisfy in some extent those features shared by machine learning meth-
ods introduced in Section 3. Although APRIORI and SUBDUE are not MO
algorithms, we illustrate the obtained Pareto fronts in Figure 2 to perform fair
comparisons with EMO-CC. In addition, we verify the performance of the former
methods by applying some multi-objective comparison metrics, namely C and
N D [19, 20]. The metric C(X , X ) measures the dominance relationship between
the set of non-dominated solutions X over other set of non-dominated solutions
X . The value C(X , X ) = 1 means that all points in X are dominated by
points in X . The opposite, C(X , X ) = 0, represents the situation where none
of the points in X are covered by the set X . The metric N D(X , X ) compares
two sets of non-dominated solutions and gives the number of solutions of X not
equal and not dominated by any member of X . The values obtained by the
methods are shown in Table 2,
1 1 1
SUPPORT
SUPPORT
0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
SPECIFICITY SPECIFICITY SPECIFICITY
Fig. 2. Pareto fronts for the GO domain by using two conicting objectives: specicity
and sensitivity. (a) Non-dominated solutions reported by the APRIORI method. (b)
Solutions recovered by the SUBDUE method. (c) Substructures recovered by the EMO-
CC methodology, where more than one solution for the same specicity level indicates
that they correspond to dierent neighborhoods.
166 R. Romero-Zaliz et al.
The obtained results of applying the former metrics reveal that there is no
solution obtained by EMO-CC that is dominated by APRIORI, and only one so-
lution obtained by SUBDUE dominates solutions belonging to the Pareto front
found by EMO-CC (Table 2(a)), as described by metric C, while there is no
solution of the latter method that dominates any solution from the other two
approaches. Moreover, the EMO-CC method discovers more non-dominated so-
lutions, as evaluated by metric N D (Table 2(b)), than both APRIORI and SUB-
DUE methods. The dierence between the values reported by the N D metric
from EMO- CC and those ones from APRIORI and SUBDUE (i.e., 181.89 and
171.80 vs. 1.20 and 1.60 from Table 2(b)) suggests that EMO-CC retrieves al-
most all solutions identied by the other methods and covers a wide set of all of
optimal solutions that can be obtained in the GO domain. This is in contrast to
the few solutions that are identied by the APRIORI and SUBDUE methods,
but remain undetected by the EMO-CC method (i.e., 1.20 and 1.60 in average
from Table 2(b)).
In addition, the EMO-CC method recovers most and more diverse solutions
than those found by the APRIORI and SUBDUE methods. Particularly, our
approach retrieves substructures of the Pareto optimal front containing few in-
stances harboring several features (i.e., cohesive substructures), which were un-
detected by the other methods.
(a) C metric
C(X , X ) APRIORI SUBDUE EMO-CC average (stdev )
(b) N D metric
N D(X , X ) APRIORI SUBDUE EMO-CC average (stdev )
89 GO:0007154 GO:0016021
cell communication integral to membrane
(level: 3) (level: 3)
256 GO:0007154 GO:0016021
cell communication integral to membrane
(level: 3) (level: 3)
GO:0050875
cellular physiological process
(level: 3)
For example class #13 is described by several substructures (Table 3). Sig-
nicantly, these descriptions are based on dierent types of descriptions (e.g.,
process and cellular components) that belong to dierent levels of the GO struc-
ture (e.g., level 6 or level 4). These diverse substructures are optimal in the sense
that belong to the Pareto optimal front (Figure 2) between specic and sensitive
descriptions. The eect of the substructures on the explained class #13 can be
visualized in (Figure 3).
EMO-CC, as a machine learning method (see Section 3 (4)), compresses those
substructures that explain an expression prole from the same point of view to
provide a summarized explanation of this phenomena (Table 3). For example,
substructures #89 and #216 are compressed because they are indistinguishable
for the class corresponding to the expression prole #13, while substructure
#179 describes it from a very dierent point of view and is preserved as a diverse
solution. This compression is dynamic because substructures are re-grouped in a
context-dependent fashion, where the context corresponds to an explained class
and a dierent classication can produce a distinct substructure association (e.g.,
substructures #89 and #216 are indistinguishable for class #13, while may be
not the case for other class of microarray or clinical experiments). Notably, this
168 R. Romero-Zaliz et al.
10000
8000
6000
4000
2000
0
10000
10000
10000
10000
#179 #256 #89 #607
8000
8000
8000
8000
6000
6000
6000
6000
4000
4000
4000
4000
2000
2000
2000
2000
0
0
10000
10000
#380 #536
8000
8000
6000
6000
4000
4000
2000
2000
0
0
10000
8000
6000
#759
4000
2000
0
Fig. 3. The eects of the explanation of the expression class #13 based on the GO
substructures identied by EMO-CC. The dashed rectangle illustrated the local ap-
plication of the non-dominance relationship within a class, and the summarization of
two indistinguishable substructures for this class. Grey lled graphs correspond to the
compressed substructures of Table 3.
25000
20000
15000
10000
5000
0
[10]. It is noteworthy that this gene was not identied by its similarity with
the centroid of the expression class #17, but from an independent substructure
provided by EMO-CC.
5 Discussion
References
1. Siripurapu, V., Meth, J., Kobayashi, N., Hamaguchi, M.: Dbc2 signicantly inu-
ences cell-cycle, apoptosis, cytoskeleton and membrane-tracking pathways. Jour-
nal of Molecular Biology 346 (2005) 8389
2. Nikitin, A., Egorov, S., Daraselia, N., Mazo, I.: Pathway studiothe analysis and
navigation of molecular networks. Bioinformatics 19 (2003) 21552157
3. Consortium, T.G.O.: Gene ontology: tool for the unication of biology. Nature
Genet. 25 (2000) 2529
4. Cook, D., Holder, L., Su, S., Maglothin, R., Jonyer, I.: Structural mining of mole-
cular biology data. IEEE Engineering in Medicine and Biology, special issue on
Advances in Genomics 4 (2001) 6774
5. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
6. Ruspini, E., Zwir, I.: Automated generation of qualitative representations of com-
plex object by hybrid soft-computing methods. In Pal, S., Pal, A., eds.: Pattern
Recognition: From Classical to Modern Approaches, Singapore, World Scientic
Company (2001) 453474
7. Back, T., Fogel, D., Michalewicz, Z., eds.: Handbook of Evolutionary Computation.
IOP Publishing Ltd., Bristol, UK (1997)
8. Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. John
Wiley & Sons, Inc. (2001)
9. Coello-Coello, C., Veldhuizen, D.V., Lamont, G.: Evolutionary Algorithms for
Solving Multi-Objective Problems. Kluwer (2002)
10. Romero-Zaliz, R., Cordon, O., Rubio-Escudero, C., Zwir, I., Cobb, J.: (A multi-
objective evolutionary conceptual clustering methodology for gene annotation from
networking databases) Submited.
11. Duda, R., Hart, P., Stork, D.: Pattern Classication (2nd Edition). Wiley-
Interscience (2000)
12. Der, G., Everitt, B.: A handbook of statistical analyses using SAS. CHAPMAN-
HALL (1996)
13. Cheeseman, P., Oldfors, R.W.: Selecting models from data. Springer-Vlg (1994)
14. Bezdek, J.: Fuzzy clustering. In Ruspini, E., Bonissone, P., Pedrycz, W., eds.:
Handbook of Fuzzy Computation, Institute of Physics Press (1998) f6.1:1f6.6:19
15. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective
genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6
(2002) 182197
16. Koza, J.: Genetic Programming: On the Programming of Computers by Means of
Natural Selection. MIT Press, Cambridge, MA, USA (1992)
17. Goldberg, D.: Genetic Algorithms in Search Optimization and Machine Learning.
Addison-Wesley (1989)
18. Jaccard, P.: The distribution of ora in the alpine zone. The New Phytologist 11
(1912) 3750
Mining Structural Databases: An EMO-CC Methodology 171
19. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: A comparative case
study and the Strength Pareto Approach. IEEE Transactions on Evolutionary
Computation 3 (1999) 257271
20. Romero-Zaliz, R., Zwir, I., Ruspini, E.: Generalized Analysis of Promoters (GAP):
A method for DNA sequence description. In: Applications of Multi-Objective
Evolutionary Algorithms. World Scientic (2004) 427450
21. Gasch, A., Eisen, M.: Exploring the conditional coregulation of yeast gene expres-
sion through fuzzy k-means clustering. Genome Biology 3 (2002)
Optimal Selection of Microarray Analysis Methods
Using a Conceptual Clustering Algorithm
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 172 183, 2006.
Springer-Verlag Berlin Heidelberg 2006
Optimal Selection of Microarray Analysis Methods 173
servable distinct profiles. Moreover, none of them subsumes the results obtained by
the other methods.
In view of these results, we propose a conceptual clustering method [8], [9], [10],
devoted to discover optimal associations of microarray analysis methods in an effort
to identify differential gene expression profiles.
2 Methods
We propose a conceptual clustering approach [8], [9], [10] devoted to identify optimal
associations among microarray analysis methods in an effort to identify differential
expression profiles (Fig. 1). This approach consists of six phases: (1) preprocessing of
the dataset; (2) identification of differentially expressed genes by application of sev-
eral statistical methods; (3) arrangement of a lattice structure containing all possible
associations of the statistical methods applied; (4) association of differentially ex-
pressed genes into differential profiles by clustering genes that change their expres-
sion over time, patient and/or treatment; (5) evaluation of the performance of the
method-associations based on their specificity and sensitivity in the identification of
previously detected differential profiles, using multiobjective optimization techniques
[11], [12]. We create a set of method association rules based on the learned mappings
of differential profiles into method-associations, [13]; (6) finally, we are able to pre-
dict optimal method-associations to identify differential profiles in new microarray
datasets by use of the method association rules.
QUERY PROFILES
MICROARRAY DIFFERENTIALLY LATTICE DIFFERENTIAL METHOD +
MICROARRAY PREPROCESSED OF METHOD EXPRESSION ASSOCIATION MICROARRAY
RAW DATA EXPRESSED
DATA GENES ASSOCIATIONS PROFILES RULES DATA
(OPTIONAL)
Fig. 1. Graphical representation of the methodology. The squared boxes represent the phases of
the methodology, the round cornered boxes correspond to the input/output data at each step,
and the ellipses the operations performed at each phase.
174 C. Rubio-Escudero et al.
M1M2M3 Mn
M1 M2 M3
M3 Mn
M1M2M3Mk
The set of genes previously identified in Section 2.2 serves as a means to create dif-
ferential expression profiles (i.e., sets of genes with coordinate changes in RNA
abundance) between treatment PT , control PC and subject. The applied representation
(Fig. 3) allows us to identify different pattern behavior among patients inside the
same experimental group, since this information may be missed if patients in the same
experimental group were not plotted individually.
We clustered separately genes in treatment and control groups. Therefore, genes
belonging to a cluster in treatment, PT , can fit in more than one cluster in control, PC ,
and vice versa. We apply the K-means clustering algorithm [16] and identify differen-
tial profiles denoted as ( PT PC ) , which are pairwise relationships between profiles, PT
Optimal Selection of Microarray Analysis Methods 175
TREATMENT CONTROL
T re a tme nt - 6 Co ntro l - 3
HOUR 0 2 4 6 9 24 0 2 4 6 9 24 0 2 4 6 9 24 0 2 4 6 9 24 HOUR 0 2 4 6 9 24 0 2 4 6 9 24 0 2 4 6 9 24 0 2 4 6 9 24
PATIENT 1 2 3 4 PATIENT 5 6 7 8
Fig. 3. The expression profiles have been represented separately for each experimental group
and patients arranged individually
and PC , from treatment and control experiments, respectively. This relationship is de-
fined as the significant intersection of genes between PT and PC , which is constrained
by a threshold based on the typical statistical power of 80%.
where TP stands for True Positives (i.e., genes exhibiting profile xu X S , which
have been successfully retrieved by the applied method-association M i ), TN stands
for True Negatives (i.e., genes exhibiting profile xu X S and not retrieved by M i ),
FP stands for False Positives (i.e., genes exhibiting profile xu X S and retrieved by
M i ) and FN stands for False Negatives (i.e., genes exhibiting profile xu X S and
not retrieved by M i ). These four factors are calculated as:
u I i (D u ) I (D i ) (D ) I I (D ) u i u i
(3)
TP = TN = FP = FN = ,
(D ) u
(D ) u
u u
where u represents the genes in the microarray set D that exhibit the queried profile
xu X S , and i = M i(D) , the genes from D retrieved by the method-association M i .
where w1 and w2 are the weights associated to (O1 , O2 ) respectively. These values
are provided by the user based on the relevance of each of these objectives for the
particular study. If no values are given, the standard (0.5, 0.5) are used.
with being the Euclidean distance, and ( PT PC ) the centroids of the profiles.
Therefore, given a set of query profiles X S , we define the strength of activation of
the if-part of the rule R f as:
3 Results
We apply our procedure to a data set derived from longitudinal blood expression pro-
files of human volunteers treated with intravenous endotoxin compared to placebo.
We expect to identify molecular pathways that provide insight into the host response
over time to systemic inflammatory insults, as part of a Large-scale Collaborative Re-
search Project sponsored by the National Institute of General Medical Sciences
(www.gluegrant.org) [18].
The data were acquired from blood samples collected from eight normal human
volunteers, four treated with intravenous endotoxin (i.e., patients 1 to 4) and four with
placebo (i.e., patients 5 to 8) [18]. Complementary RNA was generated from circulat-
ing leukocytes at 0, 2, 4, 6, 9 and 24 hours after the i.v. infusion and hybridized with
GeneChips HG-U133A v2.0 from Affymetryx Inc., containing a set of 22283 genes.
Table 1. Coincidence between methods in the retrieval of genes. The number in each cell
represents a ratio of coincidence between genes retrieved by the statistical method in that col-
umn and the genes retrieved by the statistical method in that row relative to the total number of
genes retrieved by the method in the row ( ( Row I Column) / Row ).
% M1 M2 M3 M4 M5 M6 M7 M8 M9 M 10
1 -- 92.20 52.29 75.05 96.48 69.23 85.55 70.06 61.33 50.52
M
2 56.06 -- 34.07 57.84 85.27 59.54 71.11 62.64 50.57 42.98
M
3 82.19 88.07 -- 96.24 94.77 57.35 78.75 72.87 56.86 46.73
M
M4 67.22 85.19 54.84 -- 95.16 55.49 73.65 70.20 51.49 42.83
5 55.20 77.80 33.45 58.94 -- 50.28 66.72 66.38 46.42 38.93
M
6 59.04 83.51 31.11 52.84 77.30 -- 89.63 56.56 60.64 49.38
M
M7 58.36 79.79 34.18 56.10 82.05 71.70 -- 62.34 57.23 49.07
8 57.36 84.34 37.96 64.17 95.96 54.30 74.80 -- 49.62 40.51
M
M9 62.10 84.21 36.63 58.21 84.74 72.00 84.95 61.36 -- 72.31
10 59.56 83.34 35.05 56.37 82.72 68.26 84.80 58.34 84.19 --
M
178 C. Rubio-Escudero et al.
dicating that none of the methods subsumes the others (Table 1)(e.g., from the genes
retrieved by M 3 , only 31.11% are also retrieved by M 5 , and 52.29% by M 1 ).
The lattice arranged in this particular work contains all potential combinations of un-
ion and intersection of the ten statistical methods applied. Thus, M is defined as
M = {M 1 , M 2 ,..., M 10 , M 1 M 2 , M 1 M 3 , . . . ,
M M , . .. , M 1 M 2 M 3 , . . . , M 1 M 2 M 3 . . . M 9 M 10 }
2 3
We found that there is a relationship between the statistical methods and the differ-
ential profiles they are able to identify (see Section 2.2), having differential profiles
identified by some methods and not by others. For example, the differential profile in
(Fig. 4(a)) harbors 29 genes in our dataset D and is only retrieved by those statistical
methods that take into account the time factor (e.g., M 2 , which retrieves more than
90% of these genes). This happens because the statistical methods that consider the
treatment vs. control factor make an average of the expression values from patients 1
and 2 with those of patients 3 and 4 by considering them as replicas. Consequently,
the differential behavior between them is lost.
TREATMENT CONTROL
T re a tme nt - 8 Co ntro l - 7
a)
b) T re a tme nt - 21 Co ntro l - 9
Fig. 4. Examples of differential profiles only identified by some of the statistical methods
The expression profiles have been represented separately for each experimental group
(Section 2.3), and patients arranged individually. In our current problem, with eight
patients, four treated with intravenous endotoxin (i.e., patients 1 to 4) and four with
placebo (i.e., patients 5 to 8), and data retrieved over time at hours 0, 2, 4, 6, 9 and 24,
each profile is represented by 24 consecutive time points (see Fig. 5).
The differential profiles extracted from the treatment group show different levels of
expression change. For example, there are sets of genes sharing very high variations
in the levels of expression (e.g., profiles 15, 19, 21, and 22 in Fig. 5). In addition,
some other profiles show differential characteristics for the patients (e.g., profiles 8
and 16 in Fig. 5). In the control group, the profiles are more homogeneous than in the
treatment group.
Typically, testing the coincidence among different data sources and clustering
methods serves as a tool to investigate the validity of the identified groupings [19].
We follow this guideline to increase the confidence in the obtained differential pro-
files. Therefore, we calculate the coincidence between our retrieved differential profi-
Optimal Selection of Microarray Analysis Methods 179
TREATMENT CONTROL
Treatment
Treatment - 1- 1 Treatment - 2 Treatment - 3 Treatment - 13 Treatment - 14 Treatment - 15 Control - 1 Control - 2 Control - 3
25000
20000 25000
15000 20000
15000
10000
10000
5000
5000
0 0
Treatment - 4 Treatment - 5 Treatment - 6 Treatment - 16 Treatment - 17 Treatment - 18 Control - 4 Control - 5 Control - 6
25000
20000 25000
15000 20000
15000
10000
10000
5000
5000
0 0
Treatment - 7 Treatment - 8 Treatment - 9 Treatment - 19 Treatment - 20 Treatment - 21 Control - 7 Control - 8 Control - 9
25000
20000 25000
15000 20000
15000
10000
10000
5000
5000
0 0
Treatment - 10 Treatment - 11 Treatment - 12 Treatment - 22 Treatment - 23 Treatment - 24 Control - 10 Control - 11 Control - 12
25000
20000 25000
15000 20000
10000 15000
10000
5000
5000
0 0
Fig. 5. Representation of the differential profiles obtained separately for the treatment and con-
trol groups using the statistical methods applied in the current work
les and external information provided by the Gene Ontology database [20]. To ad-
dress this problem we developed an evolutionary multiobjective conceptual clustering
methodology (R.R.Z., C.R.E., O.C., J.P.C., and I.Z., manuscript in preparation) that
extracts clusters composed of features such as biological processes, molecular func-
tions and cellular components defined at different specificity levels, and compare
these clusters with our differential profiles by using a coincidence index test based on
the hypergeometric distribution [9], [10], [19].
Table 2. Specificity and sensitivity values for the method-associations. The non-dominated so-
lutions are pointed out with a star.
1 1 (0.066, 0.983)
(0.322, 0.859)
0,8 (0.447, 0.813)
Specificity
(0.587, 0.747)
0,6
(0.866, 0.618)
0,4
Sensitivity
Fig. 6. Results of the evaluation of the method-associations contained in the lattice M for the
six selected differential profiles
4 Discussion
samples are taken as single data points. Therefore, the methodology presented is also
useful for simpler microarray experiments with single data points.
This approach presents various advantages over the standard analytical methods
usually applied to microarray experiments. First, it permits combining the results of
independent analytical methods for microarray experiments. Our proposal consists of
a conceptual clustering technique that combines the advantages of the methods ap-
plied. The combination of the union and intersection operators also provides the pos-
sibility of querying negative samples (i.e., genes which exhibit a given profiles but
not others). Second, it permits interaction with the user in the selection of differen-
tially expressed profiles, where the user provides the differential profiles queried from
the set of microarray data and receives the optimal combination of statistical methods
to retrieve the genes exhibiting those profiles. Third, the representation used for the
profiles is optimal, as plotting the patients sequentially presents advantages over the
traditional one, where all biological replicates (i.e., patients in the same experimental
group) are combined in just one set of values. The main advantage of this representa-
tion is that we can examine the behavior of the genes independently in each patient,
making it possible for us to recognize different behaviors of genes across the patients
in the same experimental group. These differences can help us to discover the influ-
ence of biological conditions not previously considered in the experiment such as
gender or age. Finally, the system provides solutions based on a trade-off of specific-
ity vs. sensitivity, whereas other methods evaluate their solutions over one measure,
usually a ratio of False Positives and the total number of genes retrieved [4], [5]. As a
result of this trade-off, the procedure provides as output all non-dominated solutions
in terms of specificity and sensitivity by application of multiobjective techniques.
The computational procedure we propose solves many of the problems actually
present in the process of analyzing a microarray experiment, such as the decision of
analytical methodology to follow, extraction of results biologically significant for the
experts, proper management of complex experiments harboring experimental condi-
tions, time-series and patients. Therefore, it sets up a robust platform for the analysis
of all types of microarray experiments, from the simplest experimental design to the
most complex, providing accurate and reliable results.
References
1. Durbin,R., Eddy,S., Krogh,A. and Mitchison,G. (1998) Biological Sequence Analysis:
Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.
2. Brown,P. and Botstein,D. (1999) Exploring the new world of the genome with DNA mi-
croarrays. Nature Genet., 21 (Suppl.), 33-37.
3. Pan,W., Lin.J. and Le.C. (2001) A mixture model approach to detecting differentially ex-
pressed genes with microarray data. Funct. Integr. Genomics, 3(3), 117-124.
4. Li,C. and Wong,W.H. (2003) DNA-Chip Analyzer (dChip). In Parmigiani,G., Garrett,E.S.,
Irizarry,R. and Zeger,S.L. (eds), The analysis of gene expression data: methods and soft-
ware. Springer.
5. Tusher,V.G., Tibshirani,R. and Chu,G. (2001) Significance analysis of microarrays ap-
plied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA. 98, 5116-5121.
Optimal Selection of Microarray Analysis Methods 183
6. Park,T., Yi,S.G., Lee,S., Lee,S.Y., Yoo,D.H., Ahn, J.I. and Lee, Y.S. (2003) Statistical
tests for identifying differentially expressed genes in time-course microarray experiments.
Bioinformatics, 19(6), 694-703.
7. Der,G. and Everitt,B.S. (2001) Handbook of Statistical Analyses using SAS. Chapman and
Hall/CRC.
8. Cheeseman,P. and Oldford,R.W. (1994) Selecting models from data : artificial intelligence
and statistics IV. Springer-Verlag, New York.
9. Zwir,I., Shin,D., Kato,A., Nishino,K., Latifi,K., Solomon,F., Hare,J.M., Huang,H. and
Groisman,E.A. (2005a) Dissecting the PhoP regulatory network of Escherichia coli and
Salmonella enterica. Proc Natl Acad Sci, 102, 2862-2867.
10. Zwir,I., Huang,H. and Groisman,E.A. (2005b) Analysis of Differentially-Regulated Genes
within a Regulatory Network by GPS Genome Navigation, Bioinformatics (in press).
11. Chankong,V. and Haimes,Y.Y. (1983) Multiobjective decision making theory and meth-
odology. North-Holland.
12. Deb,K. (2001) Multi-objective optimization using evolutionary algorithms. John Wiley &
Sons, Chichester, New York.
13. Agrawal,R., Imielinski,T., Swami,A.N. (1993) Mining association rules between sets of
items in large databases. In Buneman, P., Jajodia, S., eds.: Proceedings of the ACM
SIGMOD. International Conference on Management of Data, Washington, D.C., 207--216
14. Kooperberg,C., Sipione,S., LeBlanc,M., Strand, A.D., Cattaneo,E. and Olson, J.M. (2002)
Evaluating test statistics to select interesting genes in microarray experiments. Hum. Mol.
Genet., 11(19), 2223-2232.
15. Mitchell,T. (1997) Machine Learning. McGraw Hill.
16. Duda, R. O., and Hart, P. E. (1973) Pattern Classification and Scene Analysis. John Wiley
& Sons, New York, USA.
17. Cordn O, del Jesus, M.J., Herrera, F. (1999) A Proposal on Reasoning Methods in Fuzzy
Rule-Based Classification Systems. International Journal of Approximate Reasoning., 20,
21-45.
18. Calvano,S.E., Xiao,W., Richards,D.R., Feliciano,R.M., Baker, H.V., Cho, R.J., Chen,
R.O., Brownstein,B.H., Cobb,J.P., Tschoeke,S.K., Miller-Graziano,C., Moldawer,L.L.,
Mindrinos, M.N., Davis, R.W., Tompkins,R.G. and Lowry,S.F. (2005) The Inflammation
and Host Response to Injury Large Scale Collaborative Research Program. A Network-
Based Analysis of Systemic Inflammation in Humans. Nature, in press.
19. Tavazoie,S., Hughes,J.D., Campbell,M.J., Cho,R.J. and Church,G.M. (1999) Systematic
determination of genetic network architecture, Nat Genet, 22, 281-285.
20. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P.,
Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasar-
skis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M. and Sher-
lock, G. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology
Consortium, Nat Genet, 25, 25-29.
21. Benitez-Bellon,E., Moreno-Hagelsieb,G. and Collado-Vides,J. (2002) Evaluation of
thresholds for the detection of binding sites for regulatory proteins in Escherichia coli K12
DNA. Genome Biol. 3(3) ):RESEARCH0013.
Microarray Probe Design Using
-Multi-Objective Evolutionary Algorithms with
Thermodynamic Criteria
Biointelligence Laboratory,
School of Computer Science and Engineering,
Seoul National University, Seoul 151-742, Korea
{syshin, ihlee, btzhang}@bi.snu.ac.kr
Abstract. As DNA microarrays have been widely used for gene expres-
sion profiling and other fields, the importance of reliable probe design for
microarray has been highlighted. First, the probe design for DNA mi-
croarray was formulated as a constrained multi-objective optimization
task by investigating the characteristics of probe design. Then the probe
set for human paillomavrius (HPV) was found using -multi-objective
evolutionary algorithm with thermodynamic fitness calculation. The evo-
lutionary optimization of probe set showed better results than the com-
mercial microarray probe set made by Biomedlab Co. Korea.
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 184195, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Microarray Probe Design Using -MOEA 185
Though they have shown the good results, the main algorithm of most pre-
vious system is a simple generate and filter-out approach. Recently, a method
based on machine learning algorithms such as nave Bayes, decision trees, and
neural networks has been proposed for aiding probe selection [15]. And in our
previous work [8], we used a multi-objective evolutionary algorithm for probe
selection of DNA microarray. We designed 19 probes for human papillomaviruses
using non-dominated sorting genetic algorithm-II (NSGA-II). In this paper, we
improved our previous approach in many ways. First, we reformulated the probe
design problem by investigating the characteristics of the probe design. Sec-
ond, we adopted -multi-objective evolutionary algorithm (-MOEA) instead of
NSGA-II. In a related eld, DNA sequence design for DNA computing, we no-
ticed that -MOEA outperforms NSGA-II for DNA sequence design problem [12].
Based on these results, we improved the main algorithms to -MOEA. Third, we
changed the tness criteria of probe design by combining thermodynamic data
and sequence similarity search.
In the following sections, we explain the suggested probe design method in
detail. In section 2, we briey introduce the multi-objective optimization prob-
lem and formulate the probe design problem as multi-objective optimization
problem. Section 3 and 4 describe our probe design method and provide the
experimental results. In Section 5, the conclusion will be followed.
Therefore, the optimal solutions for a MOP are those that are not dominated by
any other solutions. Thus, ones goal in MOP is to nd such a non-dominated
set of solutions.
There exist several criteria to evaluate the set of probes [5]. We list the generally
used conditions for good probes:
1. The probe sequence for each gene should not appear other genes except its
target gene.
2. The probe sequence for each gene should be dierent from each other as
much as possible.
3. The non-specic interaction between probe and target should be minimized.
4. The probe sequence for each gene should not have secondary structure such
as hairpin.
5. The melting temperatures of the probes should be uniform.
The rst three conditions concern with the specicity of the probes. And the
secondary structure of a probe can disturb the hybridization with its target gene.
Lastly, the probes on a oligonucleotide chip are exposed to the same experimental
condition. If the melting temperatures of the probes are not uniform, some probes
can not hybridize with its target.
We formulated the above conditions for clear denition of microarray probe
design problem. The rst condition regarded as a constraint, since it is the basic
requirement for probes. And the fth condition was not considered as one of
objectives but was used as the nal decision criterion to choose the best solution
among diverse Pareto optimal solutions which are the results of the MOEA run.
Therefore, we formulated the microarray probe design using three tness
functions and one constraint. Before going on the formulation of the prob-
lem, let us introduce the basic notations. We denote a set of n probes by
P = {p1 , p2 , , pn }, where pi = {A, C, G, T }l for i = 1, 2, , n, l is the length
of each probe. And we denote the set of target genes by T = {t1 , , tn }.
The constraint is the basic requirement for probes.
g(P ) = subseq(pi , tj ), (3)
i
=j
1 if pi occurs in tj at least once
subseq(pi , tj ) =
0 otherwise
Microarray Probe Design Using -MOEA 187
Since the probe sequences should not be the subsequence of the non-target gene
sequences (condition 1), this constraint is the basic requirement. And from its
denition, this constraint should be zero. Other conditions are implemented
as three tness functions. First one is to prevent hybridization between probe
and non-target genes (condition 2). Second is to prevent hybridization between
probe and improper position of target genes (condition 3). Even though probe
hybridized to the undesired site of its target gene, this can give the right infor-
mation. Therefore, this seems to be unnecessary tness functions. However, for
more specic probe design, we add this tness function in our design criteria.
Last one is to prohibit forming unwanted secondary structures which can dis-
turb the hybridization between probe and target (condition 4). They could be
abstracted as follows:
f1 (P ) = hybridize(pi , tj ), (4)
i
=j
f2 (P ) = hybridizetarget (pi , ti ), (5)
i
f3 (P ) = secondary(pi ). (6)
i
within
target gene
non-target gene
secondary
structures
Fig. 1. The relationship between objectives for probe design. The data are generated
using 20 mer DNA sequences and Watson-Crick complement.
The previous microarray probe design tools can be classied into two groups
by their probe specicity evaluation methods: thermodynamic approach [10, 9]
and sequence similarity search approach [18, 4]. In thermodynamic approach, the
optimum probes are picked based on having free energy for the correct target, and
maximizing the dierence in free energy to other mismatched target sequences.
A sequence similarity search methods used BLAST or BLAT [6] to check cross-
hybridization. Since thermodynamic approach is more accurate method between
them [10, 9], we calculate the tness objectives in 2.2 using thermodynamic data.
The thermodynamic tness functions are implemented by the modied Mfold
190 S.-Y. Shin, I.-H. Lee, and B.-T. Zhang
[19] for OligoArray problem [10]. We downloaded the stand-alone program source
code and slightly modied for tness functions.
The multi-objective evolutionary algorithm has the advantage that one can get
the Pareto optimal solutions at a time. However the users usually need one
promising solution, not the set of whole Pareto optimal solutions. Therefore, we
incorporated the decision makers to select the most promising solution among
Pareto solutions. First, the Pareto optimal solutions can be found by -MOEA.
Then, BLAT search [6], hybridization simulation [13], and melting temperature
calculation choose one candidate solution. BLAT is a BLAST-like sequence align-
ment tool, but much faster than BLAST [6]. NACST/Sim [13] is a hybridization
simulation tool to check cross-hybridization based on nearest neighbor model of
DNA [11]. Melting temperature is also calculated by nearest neighbor model.
Through these steps, user could be recommended the most promising probe
set while maintaining the exibility to select among various solutions. Using
the characteristics of MOEA, we can improve the reliability of the optimized
probe set by combining the diverse criteria such thermodynamic tness calcula-
tion, sequence similarity search, and other user-dene criteriaThis procedure is
summarized in Fig. 3.
4 Experimental Results
[17]. HPV types can be divided into two classes: ones that are very likely to cause
the cervical cancer and the others that are not. 19 genotypes of HPV belong to
the rst class are selected as target genes. The goal is to discriminate each
of 19 genotypes among themselves. The selected 19 genes are HPV6, HPV11,
HPV16, HPV18, HPV31, HPV33, HPV34, HPV35, HPV39, HPV40, HPV42,
HPV44, HPV45, HPV51, HPV52, HPV56, HPV58, HPV59, and HPV66. And
to improve the accuracy, L1 region of each gene sequences is chosen. Each gene
and L1 region are selected by Biomedlab Co., Korea with experts laborious
works.
Table 1. The comparison result for various generation. As generation goes on, the
probes show the less cross-hybridizations.
Table 2. in silico hybridization results for Pareto set with generation 1000
As shown in Table 1, there are various candidate probe sets (4 38) as re-
sults of -MOEA. To choose best probe set among candidate probe sets, we used
BLAT with HPV gene sequences rst. However, we could not nd any cross-
hybridization using BLAT unfortunately. Since L1 region of HPV sequences is
very well discriminated parts of HPV sequences, there is no similar sequences.
Even when we compared L1 region sequences with whole HPV sequences using
BLAT, we can nd only few similar sequences. Second, we use in silico hybridiza-
tion using NACST/Sim. The results are shown in Table 2. We used NACST/Sim
for Pareto set found by 1000 generation. As explained previously, 1000 genera-
tion showed the most signicant result and other runs required too much run
times. As a result, we chose set no. 8 for nal probe set, since that set showed
the smallest number of cross-hybridization.
To verify the reliability of nal probe set, we compared the probe set by -
MOEA with the probes in commercial chip made by Biomedlab and the probe set
by NSGA-II [8]. Table 3 showed the comparison results. -MOEA found the best
probe set. Probe set by NSGA-II has three times more cross-hybridizations and
Biomedlab probes has 2.5 times more cross-hybridizations. We ran BLAT for L1
region and whole HPV sequence respectively. BLAT found one similar sequences
for whole HPV sequences in NSGA-II probe set, and could not nd any more
Microarray Probe Design Using -MOEA 193
Table 4. The final set of probes chosen by the proposed method for HPV
similar sequences. Probe set by -MOEA has also the lowest melting temperature
among three probe sets. Though NSGA-II has the smallest melting temperature
variation, the dierence is not so signicant compared to -MOEA. The reason
why NSGA-II found the near uniform melting temperature probe set is NSGA-
II used the melting temperature variation as one of objectives [8]. Even though
we did not use that objective, our approach can nd the comparable results.
The probes practically used in Biomedlab showed the poorest results in melting
temperature, even though the melting temperature variation is important for the
microarray experiment protocols. The nal probe set generated by the proposed
approach is shown in Table 4.
5 Conclusion
We formulated the probe design problem as a constrained multi-objective opti-
mization problem and presented a multi-objective evolutionary method for the
problem. Because our method is based on multi-objective evolutionary algo-
rithm, it has the advantage to provide multiple choices to users. And to make
it easy to choose among candidates, we suggested the criteria as an assistant
to the decision maker. It is shown that the proposed method could be useful to
design good probes by applying it to real-world problem and comparing them
to currently used probes.
Though the previous works focused on nding the moderate probe set in short
time, we focused on improving the quality of probe set. Therefore, our approach
194 S.-Y. Shin, I.-H. Lee, and B.-T. Zhang
Acknowledgements
This research was supported in part by the Ministry of Education & Human
Resources Development under the BK21-IT Program, the Ministry of Commerce,
Industry and Energy through MEC project, and the NRL Program from Korean
Ministry of Science and Technology. The RIACT at Seoul National University
provides research facilities for this study. The target gene and probe sequences
are supplied by Biomedlab Co., Korea.
References
1. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. John Wiley
& Sons, Ltd., 2001.
2. K. Deb, M. Mohan, and S. Mishra. A fast multi-objective evolutionary algorithm
for finding well-spread pareto-optimal solutions. KanGAL Report 2003002, Kanpur
Genetic Algorithm Laboratory, Indian Institute of Technology Kanpur, 2003.
3. K. Deb, M. Mohan, and S. Mishra. Towards a quick computation of well-spread
Pareto-optimal solutions. In Proceedings of the Second International Conference
on Evolutionary Multi-Criterion Optimization, pages 222236, 2003.
4. K. Flikka, F. Yadetie, A. Laegreid, and I. Jonassen. Xhm: A system for detection
of potential cross hybridizations in dna microarrays. BMC Bioinformatics, 5(117),
2004.
5. P. M. K. Gordon and C. W. Sensen. Osprey: a comprehensive tool employing novel
methods for design of oligonecleotides for dna sequencing and microarrays. Nucleic
Acid Research, 32(17):e133, 2004.
6. W. J. Kent. BLATthe BLAST-like alignment tool. Genome Research, 12(4):656
664, 2002.
7. M. Laumanns, L. Thiele, K. Deb, and E. Zitzler. Combining convergence and
diversity in evolutionary multi-objective optimizatin. Evolutionary Computation,
10(3):263282, 2002.
8. I.-H. Lee, S. Kim, and B.-T. Zhang. Multi-objective evolutionary probe design
based on thermodynamic criteria for HPV detection. Lecture Notes in Computer
Science, 3157:742750, 2004.
9. F. Li and G. D. Stormo. Selection of optimal DNA oligos for gene expression
arrays. Bioinformatics, 17:10671076, 2001.
10. J.-M. Rouillard, M. Zuker, , and E. Gulari. OligoArray 2.0: design of oligonucleotide
probes for DNA microarrays using a thermodynamic approach. Nucleic Acids
Research, 31(12):30573062, 2003.
11. J. SantaLucia Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA
nearest-neighbor thermodynamics. Proceedings of the National Academy of Science
of the United States of America, 95:14601465, 1998.
Microarray Probe Design Using -MOEA 195
Nikola Stojanovic
1 Introduction
Despite the advances in the genome sequencing technology, quality of the as-
sembled products remains a major concern. A recent study done by the Genome
Sciences Centre in Vancouver, Canada, in collaboration with several sequencing
centers in the United States has identied an average of 4.19 to 4.57 assem-
bly problems per one million bases including an average of 0.3 to 0.4 wrong and
misassembled clones (Rene Warren, personal communication). While these num-
bers are reasonably low for such complex operation as the assembly of vertebrate
genomes, they are suciently high to warrant continued attention.
The rst vertebrate genome assembled was that of human. This task was
preceded by the construction of detailed maps and the establishment of a large
number of markers along chromosomes [5]. Only after this task has been sub-
stantially completed the sequencing of many large insert clones (Yeast Arti-
cial Chromosomes, YACs, at rst, followed by more stable Bacterial Articial
Chromosomes, BACs) could begin. At rst, the Human Genome Project centers
intended to perform the sequencing in a structured way, following the maps and
progressively expanding the tiling path of large insert clones (further referred to
as LICs), nishing them to full accuracy (initially set to less than one error in
every ten thousand bases) in the process. While the groups outside the United
States continued pursuing this strategy, which resulted in the early completion
of chromosomes 21 [2] and 22[1], the emergence of the wholegenome shotgun
strategy [15] and subsequent challenge to the public eort by a private company,
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 196207, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Automated Verication of DNA Supercontig Assemblies 197
Celera Genomics, led to the change of course for the consortium members lo-
cated in the US. The human genome has been rst released as draft sequence [6]
covering about 90% of its euchromatic part, mostly as a collection of unordered
and unoriented contigs featuring 147,821 gaps. The nished sequence followed
almost three years later [7].
While Celera constructed its scaolds of the human genome using the whole
genome shotgun approach and mixing clones of dierent lengths [14], the public
HGP was completed using LICs, mostly BACs from CalTech and RPC-11 li-
braries. Initially there were about 600,000 of these clones, of average length of
150Kb, and only a part of these were selected for further breaking into sequenc-
ing clones (M13 or plasmid, each of about 2Kb in length). The remainder has
been end sequenced, generating paired reads of about 500 to 1,000 bases long,
and used as an aid in the nal assembly of chromosomes. BAC clones have been
created by partial digestion of DNA with restriction enzymes, EcoRI and HindIII
at CalTech, and EcoRI and MboI for RPC-11. In order to generate the desired
range of LIC sizes the enzymes have been appropriately diluted.
After the HGP switch to the rapid generation of draft sequence the goals
of clone selection have changed from the orderly generation of tiling paths of
nished LICs to the identication of non-overlapping clones, in order to assure
the production of the maximal possible amount of new sequence (Ken Dewar,
personal communication). In consequence, after generating the draft the public
consortium was left with a daunting task of organizing thousands of contigs in
which more than 50% of the bases lied in assembled regions of less than 100Kb.
This process was prone to laboratory errors in clone handling, which we have
addressed in an earlier paper [13], and to the incorrect placement of LICs in
tiling paths. In this manuscript we address a new computational method whose
original development was done in order to address the latter issue.
The eorts of generating the complete human sequence could be broadly clas-
sied in two categories: nishing of individual LICs and their arrangement along
chromosomes. While the challenges concerning the former mostly lied in labora-
tory work, the latter primarily involved computation. The rst arrangement have
been produced by W. James Kent at the University of California, Santa Cruz [9],
who used the information contained in the initial sequence contigs, linkage and
ngerprint maps, mRNA and Expressed Sequence Tags (EST) data, and BAC
end sequences. The Institute for Genomic Research (TIGR) in Rockville, Mary-
land, provided end sequences for about 500,000 BACs from the human libraries,
out of which about 300,000 were sequenced from both ends (generating around
600,000 paired reads) and the remaining 200,000 were unpaired. In addition,
about 750,000 fosmid clones (similar to BACs, but much shorter about 40Kb
in size, on average) have been created and endsequenced for the verication of
the assembly. These clones provided another 8 coverage of the human genome,
bringing the total to almost 30fold redundancy [7].
After the completion of the Human Genome Project, the sequencing com-
munity has been steadily moving towards the wholegenome shotgun assembly
method. The mouse [10] and rat [12] genomes have been assembled using a hy-
198 N. Stojanovic
Fig. 1. Coverage of genomic sequence by large insert clones. These actually sequenced,
represented by solid lines form a tiling path, while these only end-sequenced (repre-
sented by solid ends with reads pointing towards each other) are shown as dotted lines.
If the paired endreads were placed on the path in the right orientation and at about
right distance, this was considered as additional coverage of the enclosed bases. The
thick line at the bottom represents the assembled chromosomal region.
brid approach, and the subsequent vertebrate genomes were assembled by new
megaassemblers [8, 11, 4]. However, large insert clones still play a role in the
assembly of genomic scaolds [3, 14], which use both BAC and fosmid end reads
in addition to shorter fragments.
Coverage
21X
20X
19X
18X
17X
16X
15X
14X
13X
12X
11X
10X
9X
8X
7X
6X
5X
4X
3X
2X
1X
3125529
Fig. 2. Coverage of RefSeq supercontig NT 000765 with BAC clones. Positions within
the supercontig are plotted on the X-axis (03173457). Fold coverage over the sampled
positions is represented along the Y-axis.
greatly repetitive structure and the fact that TIGR has used the draft masked
for known repeats. Using unmasked sequence, the WICGR group succeeded to
place about 25% more reads, and assure LIC coverage of almost 15 through-
out the part of the genome it was in charge of, not counting the additional 8
from fosmid libraries (Nathaniel Strauss, WICGR closure group, personal com-
munication). The expectation was that every DNA base would be covered by a
relatively stable number of clones, roughly around the mean, and a missassembly
would be indicated by anomalies. However, it was somewhat surprising to see
that the actual coverage could show large variations, as illustrated in Figure 2.
The rst suspect for this apparent paradox was an uneven distribution of
the restriction enzyme target sites in parts of the genome. While this is gener-
ally true, in particular for heterochromatic and other satellite regions, in most
chromosomal DNA these sites are distributed in agreement with Poisson ex-
pectation. However, although random, the particular placement layout in any
sequence dictates a certain coverage pattern, with well dened positions of un-
usually high or unusually low coverage, i.e. the number of LICs selected from
that region. This has been veried by simulating the clone library construction
in silico and selecting the number of clones to provide several hundred, and even
several thousandfold virtual coverage of the target regions. Even at numbers
200 N. Stojanovic
Coverage
2976X
2900X
2824X
2748X
2672X
2596X
2520X
2444X
2368X
2292X
2216X
2140X
2064X
1988X
Coverage
1980X
1914X
1864X
1814X
1764X
1714X
1664X
1614X
1564X
1514X
1464X
1414X
1364X
1314X
1264X
3173248
Fig. 3. Coverage curves for a 3Mb genomic region. The top part of the gure plots the
curve achieved at virtual coverage up to 3000 (average 2500). The bottom part
shows the coverage up to 2000 (average 1600). At such high coverages the peaks
and dips of the curve tend to stabilize at xed locations, although minor variations can
still be detected.
as low as 100, the limiting curve (to which we shall further refer by L) would
converge to a pattern characteristic for that region, with well dened peaks and
bottoms, as illustrated in Figure 3.
This observation led to an idea to compare the actual coverage of a genomic
region not with an a priori determined mean coverage, but with its own charac-
teristic limiting curve L. Given a long supercontig, our software would be trained
to learn L, then compare the actual coverage (whose curve will be referred to by
C, or Ck for k coverage) with it. If C would feature peaks and dips at the posi-
tions consistent with L that would indicate a correct assembly almost regardless
of the number of mapped clones.
by each restriction enzyme, as well as the enzyme target site, percent dilution
(for partial digest) and the permissible clone size range. In addition, this le
can contain the information about the other clones used (fosmids or plasmids),
whether they are digested by restriction enzymes or randomly sheared, and what
percentage of each has been included in the libraries. In the training phase, the
software mimics the process done in the laboratory, constructs the clone libraries
covering the segment and outputs the le containing the virtual libraries.
The creation of the virtual libraries is the most timeconsuming step of the
analysis. In order to faithfully reproduce the laboratory procedure, the soft-
ware must search the sequence for the target sites of the restriction enzymes
(GAATTC for EcoRI, AAGCTT for HindIII and GATC for MboI, for instance).
It does not need to scan both strands, since these enzymes cut at palindromic
sequences, but it needs to make a random decision whether to cut or not every
time a site is found, in accordance with the specied dilution. Thus, for instance,
a 2.5% dilution of a sixcutter enzyme would dictate a cut with only 0.025 prob-
ability since under the assumption of equal nucleotide representations in the
genome the likelihood of nding a target is 416 , the probability of a cut at any
particular position would be about p 6 106 , i.e. about every 164Kb. Since
the occurrence of the cuts is a Poisson process, one can nd the probability of
the next cut within the permissible clone size range using the exponential dis-
tribution, giving P{a X b} = epa epb . If a = 120, 000 and b = 180, 000
this would be P{120,000 X 180,000} 0.15 . Virtual clones whose size
falls outside of this range must be discarded. On a Unix workstation this may
take several minutes per megabase, for higher coverages, so we have limited it to
500 or less in practical runs. Consequently, the scanning of the entire human
genome would take several days on a single workstation, and several hours on
a supercomputing system such as the UTA Distributed and Parallel Computing
Cluster.
After the virtual LIC library covering the sequence in the input has been
constructed, another module takes it over to map the ends of these clones to
the right positions. This step is not necessary when constructing L, but it is
essential for testing. The clone ends mapping introduces errors at rates specied
as parameters, including the percentage of clones for which only one end would
be sequenced (thus mimicking the discarding of poor quality reads during the
actual sequencing), and the percentage of clones where one or both ends cannot
be unambiguously mapped to the genome. By manipulating these parameters it
is possible to test the behavior of the software under various scenarios.
The most common error during the construction of large supercontigs, span-
ning millions of base pairs, is in the collapsing of regions of large segmental
duplications. If the sequences of two copies are very similar (so that the dier-
ences can be attributed to genetic variation between the individuals whose DNA
has been used for the libraries, or, in a very small number of cases, sequencing
errors), the assembly may lay two copies on the top of each other, causing the
omission of DNA between the duplicate loci. As shown in Figure 4, in terms of
the coverage of the region by clones, such situation may lead to either reduced or
202 N. Stojanovic
(a)
(b)
(c)
(d)
Fig. 4. Two possible interleaving sequence deletion scenarios: (a) Line represents chro-
mosomal DNA, with two narrow boxes indicating a large (> 200Kb) segmental dupli-
cation. LICs covering the area are shown on the top. Bottom line connects spots where
an incorrect link has been established. The area between the duplicates is presumed
to be longer than a single clone; (b) Collapsed segments with the area between them
deleted. Spanning clones from the left copy are shown at the top, and these from the
right copy at the bottom. Clone end sequences whose matching other end now fails
to map in the region are circled; (c) On a line representing chromosomal DNA, two
narrow boxes indicate shorter (< 100Kb) duplicated segment. LICs covering the area
are shown on the top. Bottom line connects spots where an incorrect link has been
established. The area between the duplicated segments is presumed to be longer than
a single clone; (d) Collapsed segments with the area between them deleted. Spanning
clones from the left copy are shown at the top, and these from the right copy at the
bottom. Clone end sequences whose matching other end now fails to map in the region
are circled. Both deletions are characterized by anomalous coverage, with spikes in the
number of unpaired clone ends along the edges of the collapsed duplications.
increased coverage in conjunction with the increase in the number of errors, i.e.
clone ends mapping to the region at unlikely distance, or as unpaired matches.
In both cases it is unlikely that the limiting curve L for the region would conrm
such spike, and indeed for suitable lengths of duplicated sequences (see the re-
sults below) we have not seen a case where a combination of standard deviation
measure and the comparison of L and C has not identied the problem spot.
The core of our method is the comparison module. It uses the constructed L
and scans the region (supercontig) in sliding windows of pre-set size, which can
be adjusted in accordance with the conditions of the assembly. In our runs we
have used windows of size 100Kb, since we were looking primarily at the deletions
due to missassemblies at the LIC level. If the length d of the duplicated area
is less than one clone size L and k is the overall redundancy of coverage, then
Automated Verication of DNA Supercontig Assemblies 203
the coverage over the collapsed duplicates would be reduced to about kd L with
areas up to L in length on each side of the collapsed duplicates with error rates
increased above the background to approximately k(Ld) L . If d > L then the
collapsed area would feature an increase in coverage of about min(2k, kd L ) over
the midpoint of the collapsed segments with coverage gradually falling to k over
the min(L, d2 ) bases on both anks of the collapsed region in addition to the
error rate of k for up to L bases around the anks. In all cases, the signs of
trouble should be present over at least 100Kb. This number would be dierent
depending on the mixture of LIC sizes used in dierent sequencing projects.
The calculation of the mean expected coverage L and the standard devia-
tion L for L and C and C for C are done at the supercontig level, but the
comparisons are done locally in each window. The number of sampling points
n within a window is automatically determined based on the actual coverage k
of the examined region, and set to 1.5 the expected number of clone starts and
ends. If, for instance, the coverage is 20 in clones of 150Kb, it is expected that
a region of 100Kb would feature about 27 points where the coverage changes (a
new clone starting or an old one ending), so we would chose 40 sampling points.
However, the coverage values for the limiting curve (expressed in hundreds, if
not thousands) and the actual data (10 30) need to be put on a uniform
scale. If the clones layout were completely random, then the amount of coverage
over any point would be normally distributed, so we convert both the limiting
and actual values at sampling points xi (we denote them by L(xi ) and Ck (xi ))
C (x )
to the standard normal curve as ziL = L(xi )
L
L
and ziCk = k iC Ck . We then
k
compare ziL and ziCk if both are within two standard deviations from the 0
mean (indicating that a deviation was neither expected nor has happened) for
all i = 1, n we accept the coverage over the window as correct. However, if this
condition is not satised for any point xi additional constraints are examined:
1. It must be that | ziL ziCk |< 3.5 i [1, n], and
2. The Pearson correlation coecient r calculated
over all ziL and ziCk must not
reject noncorrelation of L and Ck using r1r
n2
2
as a Studentt variable with
n 2 degrees of freedom, at 95% signicance.
Only if both 1 and 2 above are satised the window is considered correct, oth-
erwise it is reported as a potential problem spot.
4 Algorithm Performance
Even after many adjustments of our software it was rejecting the assembly of the
RefSeq supercontig NT 000765, whose coverage is shown in Figure 2, as incor-
rect at two loci, around 1.2Mb and around 2.3Mb. Further examination of that
supercontig has indeed established a missassembly, and NT 000765 has been
subsequently withdrawn from the GenBank. However, although this software
has been used on several genomic regions, it has not been incorporated into the
204 N. Stojanovic
Table 1. The results of the analysis of 27 windows within a 3Mb genomic region under
various simulated coverages. Each simulation has been repeated 10 times with error
rates ranging between 0% and 10%. For each coverage the comparison has been made
with the limiting curve L constructed from virtual coverage of 500.
genome closure pipeline (due to the departure from WICGR and HGP of both
the author and the closure group leader1 who requested this work).
In order to gain a systematic perspective on the performance of our algorithm
we have done a series of simulations using several supercontigs of 2Mb to 5Mb,
downloaded from the RefSeq division of the GenBank. Actual coverages have
been simulated at various levels, and the BAC ends mapping error rate has
been varied between 0 and 35%. The results obtained at < 10% error rate on
a 3Mb long sequence are shown in Table 1. As it can be seen from the table,
the number of widows failing the two standard deviations test at coverage 300
is about 5%, which corresponds well with the number of windows expected to
have coverage outside 2 bound, by chance. Since at 300 the coverage curve
is mostly stable, it indicates that the number of outliers is not unusual, and
that it is consistent with the Poisson distribution of the recognition sites for
the restriction enzymes. However, the particular locations of these outliers are
characteristic of the genomic region in question, as demonstrated by excellent
correlation of C300 curve with L. At C100 and further down to more practical
coverages of C30 , C20 and C10 the number of violations becomes progressively
larger, up to almost 20%. This is partially because the chance outliers are more
dispersed at smaller sample sizes, and for a window to fail the 2 test it is
enough that one of its sample points fall outside these bounds, in either L or C.
In these cases, the number of outliers is still higher than expected, perhaps due
to the dierence in global versus local variance.
At low coverages the false positive ratio can be high (up to 12.59% of windows
for 10 at error rate 10%, and higher in the presence of more errors, when it
starts behaving as a random sequence Table 2), but it still cuts the number
of windows that need further checking to about half of these failing the 2
test. As mentioned above, the two standard deviations threshold is much better
than 3.5 used in the HGP, and our software has still successfully agged every
missassembled region of the right size which we were aware of (and, in particular,
the errors deliberately introduced for testing).
1
Dr. Ken Dewar, now at McGill University and Genome Quebec Innovation Centre
in Montreal, Canada.
Automated Verication of DNA Supercontig Assemblies 205
For LIC sizes of 150Kb, when the segmental duplications are over less than
100Kb, the coverage at the collapsed area would concentrate around the mean
of 23 k or less and when the duplicated areas are longer than 200Kb it would peak
at around 43 k or more, up to 2k. In both cases our software was successful in
identifying the introduced problem spots, due to a low probability of correlating
outliers in both L and C. This does not mean that our algorithm has zero prob-
ability of a false negative, only that in these cases its false negative ratio is low
and that it has not happened in our tests. However, when the duplicated area
is between 100Kb and 200Kb the expected coverage over the collapsed part is
about the same as normal, so the algorithm is more vulnerable to errors. In fact,
its incorrect assumption of correlation between L and C is similar to when C is
constructed on a sequence dierent than that used to construct L. The results
of the comparisons of unrelated L and C are shown in Table 2.
Table 2. The results of the analysis of 27 windows within two unrelated 3Mb genomic
regions under various simulated coverages. Each simulation has been repeated 10 times
with no introduced errors. For each coverage the comparison has been made with the
unrelated limiting curve L constructed from virtual coverage of 500.
From Table 2 it can be seen that many windows never get into testing for
the correlation with L. Since both L and C are correctly constructed for their
respective regions, there is a large proportion of windows in which no points
violate the 2 rule. However, since L and C are not related, their outliers
are uncorrelated, and thus a larger percentage of their windows (25.5635.93
versus normal 5.1919.26) has a 2 outlier in either L or C. Because of the
random arrangement of these outliers the decrease in coverage for C has only
marginal eect on their number. While more than two thirds of the outliers have
been identied as problematic, there was still a chance of a sucient correlation
between L and C, leading to a considerable false negative rate.
In consequence, while this algorithm performs reasonably well for the mis-
sassemblies resulting from collapsing duplications less than 2L
3 and greater than
4L
3 (although with a substantial false positive ratio), it is not appropriate for
detecting these of about 2L 4L
3 through 3 . These should be checked for by other
methods, but the task is now easier as the approximate size of the duplications
can be targeted.
206 N. Stojanovic
5 Discussion
No single sequence assembly quality check works best, and a sequencing facility
should apply multiple methods in order to assure the correctness of their data.
In particular, when the duplicated genome sequences are adjacent, the analy-
sis should rely on the sizes of clones and the orientation of their end reads in
addition to the techniques described in this paper. However, the algorithm we
have described provides a considerable improvement in sensitivity when com-
pared with the simple 3.5 test, and reduces the need for checking the outliers
of two standard deviations for about 50%. Most of the regions which are labeled
suspicious by our software can be relatively quickly checked for the presence
of anking spikes in the error rates, thus reducing the need for more detailed
examination to only a handful of serious suspects.
A new tool, SEMBLANCE, is currently under development at the Washington
University Genome Sequencing Center (David Messina, personal communica-
tion). This software is designed for the comprehensive assessment of the quality
of wholegenome sequence assemblies, assessment of the impact of physical maps
on the assembly quality and the comparison of assemblies done at dierent levels
of coverage redundancy. So far, SEMBLANCE has been applied to the analysis
of wholegenome assemblies of the chimpanzee genome, at very low coverages,
comparing them with chimpanzee BAC clones not used in the assemblies. Since
the wholegenome strategies generally include a signicant proportion of LICs
(BACs and fosmids) the algorithm we have described here would be useful as a
part of a quality control toolkit such as SEMBLANCE.
Shearing of the DNA, as opposed to the digestion by restriction enzymes, has
been the method of choice for the construction of small sequencing clones for
a long time, and recently the genomic library construction eorts have moved
towards the application of this technology to fosmids, as well. Once the assem-
blies start being done based exclusively on sheared clones, we expect that the
algorithm described here would lose much of its relevance since the bound-
aries of sheared clones are not associated with any particular sequence motif and
are theoretically uniformly distributed throughout the genome, one can expect
that L would be at, and that no particular signature limiting curve could be
associated with a supercontig. However, many already assembled genomes still
need to be rened, and some even partially reassembled, and many LIC libraries
exist for the genomes which have not been fully sequenced yet. In consequence,
we expect that in the near future at least some parts of genome assembles will
be done using clones whose nature lends itself to the analysis by this algorithm
[3, 16], and to the extent they are present our approach would prove to be a
valuable addition to any assembly quality assessment software toolkit.
Acknowledgments
The author would like to express gratitude to Ken Dewar, whose innovative
ideas and diligence in leading the human genome closure eort at the Whitehead
Automated Verication of DNA Supercontig Assemblies 207
Institute Center for Genome Research paved the way for this work, as well as
to other members of the closure team who provided data and discussed relevant
issues, especially Nathaniel Strauss, Christopher Seaman and Jean L. Chang.
The early work on this algorithm has been partially supported by the Human
Genome Project sequencing grant to Eric S. Lander and WICGR. We would also
like to thank Rene Warren from the Genome Sciences Centre in Vancouver and
David Messina from the Washington University School of Medicine for the data
concerning the accuracy of recent assemblies and discussions during the Genome
Informatics meeting at the Cold Spring Harbor Laboratory in November 2005.
References
1. Dunham, I., N. Shimizu, B. Roe et al. (1999) The DNA sequence of human chro-
mosome 22. Nature 402, 489495.
2. Hattori, M., A. Fujiyama, T. Taylor et al. (2000) The DNA sequence of human
chromosome 21. Nature 405, 311319.
3. Havlak, P., R. Chen, K.J. Durbin, A. Egan, Y. Ren, X.-Z. Song, G.M.Weinstock
and R.A. Gibbs (2004) The Atlas genome assembly system. Genome Res. 14, 721
732.
4. Huang, X., J. Wang, S. Aluru, S.-P. Yang and L. Hillier (2003) PCAP: A whole-
genome assembly program. Genome Res. 13, 21642170.
5. International Human Genome Mapping Consortium (2001) A physical map of the
human genome. Nature, 409, 934941.
6. International Human Genome Sequencing Consortium (2001) Initial sequencing
and analysis of the human genome. Nature 409, 860921.
7. International Human Genome Sequencing Consortium (2004) Finishing the euchro-
matic sequence of the human genome. Nature, 431, 931945.
8. Jae, D. B., J. Butler, S. Gnerre, E. Mauceli, K. Lindblad-Toh, J. P. Mesirov, M. C.
Zody, and E. S. Lander (2003) Wholegenome sequence assembly for mammalian
genomes: Arachne2. Genome Res. 13, 9196.
9. Kent, W. J. and D. Haussler (2001) Assembly of the working draft of the human
genome with GigAssembler. Genome Res. 11, 15411548.
10. Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative
analysis of the mouse genome. Nature 420, 520562.
11. Mullikin, J. and Z. Ning (2003). The Phusion assembler. Genome Res. 13, 8190.
12. Rat Genome Sequencing Consortium (2004) Genome sequence of the brown Nor-
way rat yields insights into mammalian evolution. Nature 428, 493521.
13. Stojanovic, N., J. L. Chang, J. Lehoczky, M. C. Zody, and K. Dewar (2002) Identi-
cation of mixups among DNA sequencing plates. Bioinformatics 18, 14181426.
14. Venter, J., M. Adams, E. Myers et al. (2001). The sequence of the human genome.
Science 291, 13041351.
15. Weber, J. L. and E. W. Myers (1997) Human wholegenome shotgun sequencing.
Genome Res. 7, 401409.
16. Xu, J. and J.I. Gordon (2005) MapLinker: a software tool that aids physical map
linked whole genome shotgun assembly. Bioinformatics 21, 12651266.
From HP Lattice Models to Real Proteins:
Coordination Number Prediction Using
Learning Classifier Systems
1 Introduction
The prediction of the 3D structures of proteins is both a fundamental and dif-
cult problem in computational biology. A popular approach to this problem is
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 208220, 2006.
c Springer-Verlag Berlin Heidelberg 2006
From HP Lattice Models to Real Proteins 209
2 Problem Definition
in CN prediction besides the residue type of the residues in the chain, such
as global information of the protein chain [12], data from multiple sequences
alignments [13, 19, 18, 12] (mainly from PSI-BLAST [20]), predicted secondary
structure [13, 19], predicted solvent accessibility [13] or sequence conservation
[19].
There are also two main denitions of the distance used to determine whether
there is contact between two residues. Some methods use the Euclidean distance
between the C atoms of the two residues, while others use the C atom (C
for glycine). Also, several methods discard the contacts between consecutive
residues in the chain, and dene a minimum chain separation as well as useing
many dierent distance thresholds. Figure 1 shows a graphical representation of
a non-local contact between two residues of a protein chain.
Native
Contact state
Primary structure
Finally, there are two approaches to classication. Some methods predict the
absolute CN, assigning a class to each possible value of CN. Other methods
group instances 1 with close CN, for example, separating the instances with
CNs lower or higher than the average of the training set, or dening classes in a
way that guarantees uniform class distribution. We employ the latter approach
as explained in section 2.3
2.1 HP Models
In the HP model (and its variants) the 20 residue types are reduced to two
classes: non-polar or hydrophobic (H) and polar (P) or hydrophilic. An n residue
protein is represented by a sequence s {H, P }+ with |s| = n. The sequence s
is mapped to a lattice, where each residue in s occupies a dierent lattice cell
and the mapping is required to be self-avoiding. The energy potential in the
HP model reects the propensity of hydrophobic residues to form a hydrophobic
core.
In the HP model, optimal (i.e. native) structures minimize the following en-
ergy potential:
E(s) = (i,j i,j ) (1)
i<j ; 1i,jn
1
For the rest of the paper the machine learning definition of instance is used: individ-
ual independent example of the concept to be learned [21]. That is, a set of features
and the associated output (a class) that is to be predicted.
From HP Lattice Models to Real Proteins 211
where
1 if i, j are in contact and |i j| > 1
i,j = (2)
0 otherwise
In the standard HP model, contacts that are HP and PP are assigned an
energy of 0 and an HH contact is assigned an energy of -1.
2.2 Definition of CN
The distance used to determine contact by Kinjo et al. is dened using the C
atom (C for glycine) of the residues. The boundary of the sphere dened by
the distance cuto dc + is made smooth by using a sigmoid function. Also,
a minimum chain separation of two residues is required. Formally, the CN (Oip )
of the residue i of protein chain p is computed as:
1
Oip = (3)
1 + exp(w(rij dc ))
j:|ji|>2
where rij is the distance between the C atoms of the ith and jth residues. The
constant w determines the sharpness of the boundary of the sphere. A value of
three for w was used for all the experiments.
4 Experimental Framework
4.1 HP Lattice-Based Datasets
Two datasets were employed in this study, a 3D HP lattice model protein
dataset and a data set of real proteins. Table 1 summarizes both datasets,
which are available at http://www.cs.nott.ac.uk/~nxk/hppdb.html. For the
Lattice-HP study, a set of structures from Harts Tortilla Benchmark Col-
lection (http://www.cs.sandia.gov/tech_reports/compbio/tortilla-hp-
benchmarks.html) was used. This consisted of 15 structures on the simple cubic
lattice (CN=6). Windows were generated for one, two and three residues at each
side of a central residue and the CN class of the central residue assigned as the
class of the instance. The instances was divided randomly into ten pairs of train-
ing and test sets These sets act in a similar way to a ten-fold cross-validation.
The process was repeated ten times to create ten pairs of training and test sets.
Each reported accuracy will be, therefore, the average of one hundred values.
1.0
1.0
h h h
p p p
0.8
0.8
0.8
Proportion of Residues
Proportion of Residues
Proportion of Residues
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 1 0 1 2 0 1 2 3 4
1.0
1.0
h h h
p p p
0.8
0.8
0.8
Proportion of Residues
Proportion of Residues
Proportion of Residues
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 1 0 1 2 0 1 2 3 4
5 Results
The performance of GAssist was compared to two other machine learning sys-
tems: C4.5 [27], a rule induction system and Naive Bayes [28], a Bayesian learn-
ing algorithm. The WEKA [21] implementation of these algorithms was used.
Student t-tests were applied to the mean prediction accuracies (rather than indi-
vidual experimental data points) to determine, for each dataset, those algorithms
that signicantly outperformed other methods using a condence interval of 95%
and Bonferroni correction [29] for multiple pair-wise comparisons was used.
Table 4. CN Prediction Accuracies for the Real-HP and Real-AA datasets. A means
that GAssist outperformed the Algorithm to the left (5% t-test significance). A label
means that the Algorithm on the left outperformed GAssist (5% t-test significance).
Equation 4 shows the eect of reducing the alphabet and the window size:
creating many copies of the same instances. Equation 5 shows how this reduction
creates inconsistent instances: instances with equal input attributes (antecedent)
but dierent class. For the sake of clarity this measure has been normalized for
the dierent number of target states. Table 5 shows these ratios. For two-states
and window size of one, the Real-HP dataset shows the most extreme case: any
possible antecedent appears in the data set associated to both classes. Fortu-
nately, the proportions of the two classes for each antecedent are dierent, and
the system can still learn. We see how the Real-HP dataset is highly redun-
dant and how the Real-AA dataset of window size two and three presents low
redundancy and inconsistency rate.
HP representation AA representation
States Window Size Redundancy Inconsistency Redundancy Inconsistency
1 99.99% 100.000% 93.69% 90.02%
2 2 99.94% 92.50% 6.14% 3.85%
3 99.75% 81.71% 0.21% 0.05%
1 99.98% 96.88% 90.90% 87.01%
3 2 99.92% 86.25% 4.50% 2.84%
3 99.66% 76.00% 0.17% 0.04%
1 99.97% 93.75% 85.84% 81.52%
5 2 99.86% 86.25% 2.97% 1.84%
3 99.46% 74.36% 0.14% 0.03%
6 Discussion
The LCS and other machine learning algorithms preformed at similar levels for
these CN prediction tasks. Generally, increasing the number of classes (number
of states) leads to a reduction in prediction accuracy which can be partly oset
From HP Lattice Models to Real Proteins 217
by using a larger window size. Reduction of input information from full residue
type to HP sequence reduces the accuracy of prediction. The algorithms were,
however, all capable of predictions using HP sequence that were within 5% of
the accuracies obtained using full residue type sequences.
For all of the algorithms studied, in the case of the most informative ve
state predictions, moving from HP lattice to real protein HP sequences leads to
a reduction of CN prediction accuracy from levels of around 50% to levels of
around 30%. The signicant reduction in the spatial degrees of freedom in the
Lattice-HP models leads to an improvement in prediction accuracy of around
20%.
In contrast, moving from the real protein HP sequences to real protein full
residue type sequences (for the same ve state CN predictions) only a 3-5%
improvement in prediction accuracy results from inclusion of this additional
residue type information. This seems to indicate that hydrophobicity information
is a key determinant of CN and that algorithmic studies of HP models are
relevant. The rules that result from a reduced two letter alphabet are simpler and
easier to understand than those from the full residue type studies. For example,
for the HP representation a rule set giving 62.9% accuracy is shown below (an
X symbol is used to represent positions at the end of the chains, that is beyond
the central residue being studied).
1. If AA1 / {x} and AA {h} and AA1 {p} then class is 1
2. If AA1 {h} and AA {h} and AA1 / {x} then class is 1
3. If AA1 {p} and AA {h} and AA1 {h} then class is 1
4. Default class is 0
In these rules, a class assignment of high is represented by 1 and low by 0.
For the full residue type representation a rule set giving 67.7% accuracy is:
1. If AA1 / {D, E, K, N, P, Q, R, S, X} and AA
/ {D, E, K, N, P, Q, R, S, T }
and AA1 / {D, E, K, Q, X} then class is 1
2. If AA1 / {X} and AA {A, C, F, I, L, M, V, W, Y } and AA1 /
{D, E, H, Q, S, X} then class is 1
3. If AA1 / {P, X, Y } and AA {A, C, F, I, L, M, V, W, Y } and AA1 /
{K, M, T, W, X, Y } then class is 1
4. If AA1 / {H, I, K, M, X} and AA {C, F, I, L, M, V, W, Y } and AA1 /
{M, X} then class is 1
5. Default class is 0
Recently, Kinjo et al [12] reported two, three and ten state CN prediction
at accuracies of 72.1%, 53.7%, and 18.8% respectively, which is higher than our
results. However, they use a non-standard accuracy measure that usually gives
slightly higher results than the one used in this paper. Also, they use more input
information than was used in the experiments reported in this paper.
The aim of this paper was to compare the performance dierence between the
Real-AA and Real-HP representations, not to obtain the best CN results. We
have undertaken more detailed studies on both the HP model dataset for CN
and Residue Burial prediction and the real protein datasets for CN prediction
in comparison to the Kinjo work (papers submitted).
218 M. Stout et al.
Acknowledgments
We acknowledge the support provided by the UK Engineering and Physical Sci-
ences Research Council (EPSRC) under grant GR/T07534/01 and the Biotech-
nology and Biological Sciences Research Council (BBSRC) under grant BB/
C511764/1.
References
1. Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding
transition in globular proteins. ii. application to two-dimensional lattice proteins.
Biopolymers 20 (1981) 10131031
2. Hart, W.E., Istrail, S.: Crystallographical universal approximability: A complexity
theory of protein folding algorithms on crystal lattices. Technical Report SAND95-
1294, Sandia National Labs, Albuquerque, NM (1995)
3. Hinds, D., Levitt, M.: A lattice model for protein structure prediction at low
resolution. In: Proceedings National Academy of Science U.S.A. Volume 89. (1992)
25362540
From HP Lattice Models to Real Proteins 219
4. Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: General
lattices and energy potentials. Journal of Computational Biology (1997) 120
5. Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A
test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92 (1995)
325329
6. Escuela, G., Ochoa, G., Krasnogor, N.: Evolving l-systems to capture protein
structure native conformations. In: Proceedings of the 8th European Conference
on Genetic Programming (EuroGP 2005), Lecture Notes in Computer Sciences
3447, pp 73-84, Springer-Verlag, Berlin (2005)
7. Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzy-
evolutionary hybrid. In Verdegay, J., ed.: Fuzzy Sets based Heuristics for Op-
timization, Springer (2002)
8. Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with
evolutionary algorithms. In Banzhaf, W., Daida, J., Eiben, A., Garzon, M.,
Honavar, V., Jakaiela, M., Smith, R., eds.: GECCO-99: Proceedings of the Ge-
netic and Evolutionary Computation Conference, Morgan Kaufmann (1999)
9. Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for
protein structure prediction. In: Proceedings of the Parallel Problem Solving from
Nature VII. Lecture Notes in Computer Science. Volume 2439. (2002) 769778
10. Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and
crossover mismatch in a molecular design problem. In Bentley, P., ed.: AID98:
Proceedings of the Workshop on Artificial Intelligence in Design 1998. (1998)
11. Krasnogor, N., Pelta, D., Marcos, D.H., Risi, W.A.: Protein structure prediction
as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algo-
rithms 1998. (1998)
12. Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of
native protein structure from amino acid sequence. Proteins 58 (2005) 158165
13. Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network
architectures dag-rnns and the protein structure prediction problem. Journal of
Machine Learning Research 4 (2003) 575 602
14. Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3
(1995) 149175
15. DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept
learning. Machine Learning 13 (1993) 161188
16. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michi-
gan Press (1975)
17. Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining era:
Representations, generalization, and run-time. PhD thesis, Ramon Llull University,
Barcelona, Catalonia, Spain (2004)
18. MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20
(2004) I224I231
19. Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines.
In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering,
IEEE Computer Society (2003) 2636
20. Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lip-
man, D.J.: Gapped blast and psi-blast: a new generation of protein database search
programs. Nucleic Acids Res 25 (1997) 33893402
21. Witten, I.H., Frank, E.: Data Mining: practical machine learning tools and tech-
niques with java implementations. Morgan Kaufmann (2000)
22. Rissanen, J.: Modeling by shortest data description. Automatica vol. 14 (1978)
465471
220 M. Stout et al.
23. Bacardit, J., Goldberg, D., Butz, M., Llora, X., Garrell, J.M.: Speeding-up pitts-
burgh learning classifier systems: Modeling time and accuracy. In: Parallel Problem
Solving from Nature - PPSN 2004, Springer-Verlag, LNCS 3242 (2004) 10211031
24. Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative
protein chains from the protein data bank (pdb). Nucleic Acids Res 29 (2001)
219220
25. Sander, C., Schneider, R.: Database of homology-derived protein structures. Pro-
teins 9 (1991) 5668
26. Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and non-
polar amino acids: implications for amyloidogenesis. J Mol Biol 296 (2000) 961968
27. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
28. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers.
In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence,
Morgan Kaufmann Publishers, San Mateo (1995) 338345
29. Miller, R.G.: Simultaneous Statistical Inference. Springer Verlag, New York (1981)
Heidelberger, Berlin.
30. Miller, S., Janin, J., Lesk, A., Chothia, C.: Interior and surface of monomeric
proteins. J Mol Biol 196 (1987) 641656
31. Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity
by residue grouping. Protein Eng 16 (2003) 323330
Conditional Random Fields for Predicting and
Analyzing Histone Occupancy, Acetylation and
Methylation Areas in DNA Sequences
Dang Hung Tran1 , Tho Hoan Pham2 , Kenji Satou1,3 , and Tu Bao Ho1,3
1
School of Knowledge Science, Japan Advanced Institute of Science and Technology,
1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
tran@jaist.ac.jp
2
Faculty of Information Technology, Hanoi University of Pedagogy,
136 Xuan Thuy, Cau Giay, Hanoi, Vietnam
3
Institute for Bioinformatics Research and Development (BIRD),
Japan Science and Technology Agency (JST), Japan
1 Introduction
Eukaryotic genomes are packaged into nucleosomes that consist of 145147 base
pairs of DNA wrapped around a histone octamer [9]. The histone components of
nucleosomes and their modication state (of which acetylation and methylation
are the most important ones) can profoundly inuence many genetic activities,
including transcription [2, 4, 5, 16], DNA repair, and DNA remodeling [13].
There have been recently many studies of mapping histone occupancies to-
gether with their modications in DNA sequences and of relationship between
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 221230, 2006.
c Springer-Verlag Berlin Heidelberg 2006
222 D.H. Tran et al.
them and various genetic activities concerning DNAs [1, 2, 5, 7, 16, 18, 19]. But
most of these studies were experimentally conducted by the combination of chro-
matin immunoprecipitation and whole-genome DNA microarrays, or ChIP-Chip
protocol.
The nucleosome occupancy as well as its modications such as acetylation
and methylation mainly depend on the DNA sequence area they incorporate
in. The majority of acetylation and methylation occurs at specic highly con-
served residues in the histone components of nucleosomes: acetylation sites in-
clude at least nine lysines in histone H3 and H4 (H3K9, H3K14, H3K18, H3K23,
H3K27, H4K5, H4K8, H4K12, and H4K16); methylation sites include H3K4,
H3K9, H3K27, H3K36, H3K79, H3R17, H4K20, H4K59, H4R3 [14]. When a nu-
cleosome appears in a specic DNA sequence area, these potentially sites can
have a certain acetylation or methylation level [5, 16].
Recently we have introduced a support vector machine (SVM)-based method
to qualitatively predict histone occupancy, acetylation and methylation areas in
DNA sequences [15]. In this paper, we present a dierent computational method
for this prediction problem. We employ conditional random elds (CRF) [6], a
novel machine learning technique, to discriminate between DNA areas with high
and low relative occupancy, acetylation, or methylation. Our experiments showed
that CRF-based method has competitive performance with SVM method. More-
over, similar to SVMs, our CRF method can extract informative k-gram features
based on their weight in the CRFs model trained from the datasets of these DNA
modications. The results from our CRF-method on the yeast genome are con-
sistent with those from the SVM method and reveal genetic area preferences
of nucleosome occupancy, acetylation, and methylation that are consistent with
previous studies.
2.1 Datasets
two kinds of model for solving this problem, generative models and conditional
models. While generative models dene a joint probability distribution of the
observation and labelling sequences p(X, Y ), the conditional models specify the
probability of a label given an observation sequence p(Y |X). The main draw-
back in generative models is that, in order to dene a joint probability distri-
bution, they must enumerate all possible observation sequences, which may be
not feasible in practice [6, 12, 21]. Our work employs conditional models, spe-
cially conditional random elds, which can overcome the drawbacks of generative
models.
CRF [6] is a probabilistic framework for segmenting and labelling sequential
data using conditional model [6]. It has the form of a undirected graph that
denes a log-linear distribution over label sequences given a particular obser-
vation sequence. CRFs have several advantages over other models (e.g., HMMs
and MEMMs) such as relaxing strong independence Markov assumptions and
avoiding weakness called the label bias problem [6, 11, 12, 21].
T
p (Y |X) = Z(X) exp(
1
i=1 k k fk (yi1 , yi , X, i))
where Z(X) = sS exp( Ti=1 k k fk (yi1 , yi , X, i)) is a normalization factor
over all state sequences, and fk (yi1 , yi , X, i) are feature functions, each of them
is either a state feature function or a transition function [20, 12, 21]. A state
feature captures a particular property of the observation sequence X at current
state yi . A transition feature represents sequential dependencies by combining
the label l of the previous state yi1 and the label l of the current state yi .
As [6], we assume that the feature functions is xed, and denote = {k } as a
weight vector which to be learned through training.
At the end time (i.e., t = T 1), we can backtrack through the stored infor-
mation to nd the most likely sequence y .
N
Training CRFs. Let D = (xk , y k ) k=1 be the training data set. CRFs are
trained by nding the weight vector = {1 , 2 , ...} to maximize the log-
likelihood
2
L= N j=1 log p (y |x ) k 2
(j) (j)
2
where the second sum is a Gaussian prior over parameters (with variance 2 )
that provides smoothing to help coping with sparsity in the training data [3].
Since the likelihood function in exponential models of CRFs is convex, the
above optimization problem always has the global optimum solution, which can
be found by an iterated estimation procedure. The traditional method for train-
ing in CRFs is iterative scaling algorithms [6, 21]. Sine those methods are very
slow for classication [20], therefore we use quasi-Newton methods, such as L-
BFGS [8], which are signicantly more ecient [10, 20].
L-BFGS is a limited-memory quasi-Newton procedure for unconstrained op-
timization that requires the value and gradient vector of a function to be opti-
mized. Assuming that the training labels on instance j make its state path unam-
biguous, let y (j) denote that path, then the rst-derivative of the log-likelihood is
Conditional Random Fields for Predicting and Analyzing Histone Occupancy 225
L N N k
k = j=1 Ck (y (j) , x(j) j=1 y p (y|x(j)
)Ck (y, x(j)
) 2
The most important issue in CRFs learning is to select a set of features that
hopefully capture the relevant relationships among observations and label se-
quences. CRFs have two kinds of features, state features and transition features.
However, in this work we focus only on state features. Also, each observation
sequence in the datasets has only one observation (L-DNA sequence area) and
the label sequence is a sequence of 0 (negative class) and 1 (positive class).
Our feature set to input to CRF systems is built by two steps. First, we use
a k-sliding window along a DNA sequence to get binary k-grams (patterns of
k consecutive nucleotide symbols). Each DNA sequence is thus represented by
a binary 4k -dimensional vector of all possible k-grams. Second, we dene the
unigram function for each k-gram as follows:
1 if the tth k-gram appear in the sequence x
ut (x) =
0 otherwise
Therefore, the relationship between the observation and two classes, positive
and negative, is described in the following features:
ut (x) if y belong to positive class
ftP (y, x) =
0 otherwise
ut (x) if y belong to negative class
ftN (y, x) =
0 otherwise
(80.85) (80.79) (80.82) (81.51) (81.34) (81.43) (81.20) (81.12) (81.16) (81.45) (81.39) (81.42)
H3K9acvsH3.YPD 70.36 70.22 70.29 71.50 71.27 71.38 69.86 69.65 69.75 71.58 71.44 71.51
(70.92) (70.74) (70.83) (73.98) (73.49) (73.74) (71.12) (70.96) (71.04) (71.76) (70.22) (70.98)
H3K14acvsH3.YPD 66.58 65.99 66.28 68.13 68.06 68.09 66.26 65.69 65.97 68.27 67.78 68.02
(68.55) (67.66) (68.10) (73.12) (71.68) (72.39) (68.80) (67.80) (68.30) (71.95) (67.44) (69.62)
H3K14acvsWCE.YPD 62.86 62.60 62.73 63.76 63.67 63.71 65.88 65.69 65.79 63.74 63.72 63.73
(64.04) (63.95) (64.00) (68.04) (67.83) (67.94) (64.47) (64.37) (64.42) (66.52) (65.81) (66.16)
H3K14acvsH3.H2O2 65.42 65.41 65.41 66.58 66.44 66.51 66.21 66.02 66.12 66.46 66.43 66.44
(67.14) (67.13) (67.14) (69.88) (69.44) (69.66) (67.35) (67.34) (67.34) (69.28) (68.85) (69.06)
H4acvsH3.YPD 66.21 66.02 66.12 67.64 67.43 67.53 65.94 65.75 65.85 67.85 67.66 67.75
(68.01) (67.62) (67.82) (72.22) (71.60) (71.91) (68.31) (67.85) (68.08) (69.98) (68.11) (69.03)
H4acvsH3.H2O2 69.73 69.69 69.71 70.44 70.27 70.35 69.65 69.59 69.62 70.16 70.11 70.13
(69.32) (69.27) (69.29) (71.69) (71.42) (71.55) (69.32) (69.25) (69.29) (69.93) (69.07) (69.50)
H3K4me1vsH3.YPD 65.21 64.79 65.00 66.47 65.83 66.15 64.87 64.40 64.63 67.03 66.34 66.68
(66.16) (65.59) (65.87) (69.55) (68.48) (69.01) (66.43) (65.79) (66.11) (68.30) (65.87) (67.06)
H3K4me2vsH3.YPD 68.58 68.05 68.31 64.78 62.85 63.80 62.95 62.15 62.54 64.41 62.95 63.67
(64.20) (61.50) (62.82) (68.10) (64.15) (66.07) (64.68) (61.70) (63.16) (67.25) (60.18) (63.52)
H3K4me3vsH3.YPD 67.00 66.81 66.91 63.18 62.66 62.92 60.90 60.53 60.71 (64.49) 64.23 64.36
(63.96) (63.55) (63.75) (68.90) (64.32) (66.53) (64.50) (64.06) (64.28) (68.08) (65.76) (66.90)
H3K36me3vsH3.YPD 69.76 69.55 69.65 71.21 71.05 71.13 69.46 69.24 69.35 71.44 70.90 71.17
(70.85) (70.61) (70.73) (72.39) (71.93) (72.16) (71.11) (70.87) (70.99) (72.83) (71.88) (72.36)
H3K79me3vsH3.YPD 75.83 75.56 75.69 78.02 77.87 77.94 75.50 75.37 75.44 77.81 77.73 77.77
(76.24) (76.02) (76.13) (78.95) (78.31) (78.63) (76.44) (76.21) (76.32) (78.03) (77.38) (77.70)
Note: 1.Pre., Rec., F1. are precision, recall and F1-measure, respectively.
2.The numbers in the brackets are prediction results by using SVM [15]
Conditional Random Fields for Predicting and Analyzing Histone Occupancy 227
TP TN
P recisionpositive = T P +F P ; P recisionnegative = T N +F N
TP TN
Recallpositive = T P +F N ; Recallnegative = T N +F P
P recisionpositive +P recisionnegative
P recision = 2
Recallpositive +Recallnegative
Recall = 2
recisionRecall)
F 1 measure = 2(P P recision+Recall
where T P, T N, F P, F N are the number of true positive, true negative, false
positive and false negative examples, respectively.
Through various experiments we found that our method gave the best results
when predicting nucleosome occupancy, acetylation, and methylation for DNA
sequence areas of length L = 500 (data not shown). Due to the computational
complexity, we have only tried with k 6 and report here the results from sets
of k-grams with k=5, k=6, k=4,5, and k=5,6.(Table 2).
The highest performance of our CRF method (at 18th L-BFGS it-
eration) for relative histone occupancy predictions (H3, H4, H3.H2O2),
and acetylation predictions (H3K9acvsH3, H3K14acvsH3, H3K14acvsWCE,
H3K14acvsH3.H2O2, H4acvsH3, H4acvsH3.H2O2), as well as methylation
predictions (H3K4me1vsH3, H3K4me2vsH3, H3K4me3vsH3, H3K36me3vsH3,
H3K79me3vsH3.YPD) achieved when we use features of both 5-grams and 6-
grams (Table 2). The numbers in the brackets are the performance of the sup-
port vector machine (SVM)-based method (which was used in [15] to address the
same problem) when using the same binary k-gram features. As it can be seen,
CRF method is competitive with SVM-based method. In some cases, CRFs gave
better performance, but in others performance was worse. SVM method can take
into account the number of k-gram occurrences that represents DNA sequence
better than binary k-gram features, hence SVM method can achive better per-
formance [15]. However, CRFs have some advantages over SVMs such as they
can easily incorporate knowledge into their prediction, and in the future we will
take account annotated information concerning DNA sequence into our CRF
method to improve the prediction results.
Table 3. Most informative features selected from CRFs model for positive class with
k-grams=4 and k-grams=5
negative features (Table 4). In other words, CG-rich DNA sequence areas are
often free of histone occupancy, acetylation, or methylation. We all knew that
CpG islands are usually near to gene starts. So we can infer from our results
that promoter regions are often not occupied by nucleosomes. This is consistent
with previous results by experimental approaches in vivo [16].
4 Conclusion
We have introduced a conditional model based method to predict qualitative
histone occupancy, acetylation, and methylation areas in DNA sequences. We
have selected a basic set of features based on DNA-sequence. Moreover, our
model can evaluate the informative features to discriminate between DNA areas
Conditional Random Fields for Predicting and Analyzing Histone Occupancy 229
Table 4. Most informative features selected from CRFs model for negative class with
k-grams=4 and k-grams=5
with high and low occupancy, acetylation, or methylation. In the near future,
we plan to incorporate features related to sequence motifs into our method in
order to capture more faithfully the constrains on the model.
Acknowledgements
The research described in this paper was partially supported by the Institute for
Bioinformatics Research and Development of the Japan Science and Technology
Agency, and by COE project JCP KS1 of the Japan Advanced Institute of
Science and Technology. We also would like to thank Hieu P.X. for providing
FlexCRFs package and sharing with us his experience in machine learning area.
230 D.H. Tran et al.
References
1. B. E. Bernstein, E. L. Humphrey, R. L. Erlich, R. Schneider, P. Bouman, J. S. Liu,
T. Kouzarides, and S. L. Schreiber. Methylation of histone H3 Lys 4 in coding
regions of active genes. Proc. Natl. Acad. Sci. USA, 99(13):86958700, 2002.
2. B. E. Bernstein, C. L. Liu, E. L. Humphrey, E. O. Perlstein, and S. L. Schreiber.
Global nucleosome occupancy in yeast. Genome Biol., 5(9):R62, 2004.
3. S. F. Chen and R. Rosenfeld. A gaussian prior for smoothing maximum entropy
models. Technical report CMU-CS-99-108, 1999.
4. T. Kouzarides. Histone methylation in transcriptional control. Curr. Opin. Genet.
Dev., 12(2):198209, 2002.
5. S. K. Kurdistani, S. Tavazoie, and M. Grunstein. Mapping global histone acetyla-
tion patterns to gene expression. Cell, 117(6):721733, 2004.
6. J. Laerty, A. McCallum, and F. Pereira. Conditional random elds: Probabilistic
models for segmenting and labeling sequence data. In Proc. 18th International
Conference on Machine Learing, 2001.
7. C. K. Lee, Y. Shibata, B. Rao, B. D. Strahl, and J. D. Lieb. Evidence for nucleosome
depletion at active regulatory regions genome-wide. Nat. Genet., 36(8):900905,
2004.
8. D. Liu and J.Nocedal. On the limited memory bfgs method for large-scale opti-
mization. Mathematical Programming, 45:503528, 1989.
9. K. Luger, A. W. Mader, R. K. Richmond, D. F. Sargent, and T. J. Richmond.
Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature,
389(6648):251260, 1997.
10. R. Malouf. A comparison of algorithms for maximum entropy parameter estima-
tion. In Proc. Proceeding CoNLL, 2002.
11. A. McCallum. Maximum entropy markov models for information extraction and
segmentation. In Proc. 15th International Conference on Machine Learing, 2000.
12. A. McCallum. Eciently inducing features of conditional random elds. In Proc.
19th Conference on Uncertainy in Articial Intelligence, 2003.
13. G. J. Narlikar, H. Y. Fan, and R. E. Kingston. Cooperation between complexes
that regulate chromatin structure and transcription. Cell, 108(4):475487, 2002.
14. C. L. Peterson and M. A. Laniel. Histones and histone modications. Curr. Biol.,
14(14):R546R551, 2004.
15. T.H. Pham, D.H. Tran, T.B. Ho, K. Satou, and G. Valiente. Qualitatively predict-
ing acetylation and methylation areas in dna sequences. In Proc. 16th International
Conference on Genome Informatics, 2005.
16. D. K. Pokholok, C. T. Harbison, S. Levine, M. Cole, N. M. Hannett, T. I. Lee,
G. W. Bell, K. Walker, P. A. Rolfe, E. Herbolsheimer, J. Zeitlinger, F. Lewitter,
D. K. Giord, and R. A. Young. Genome-wide map of nucleosome acetylation and
methylation in yeast. Cell, 122(4):517527, 2005.
17. L. R. Rabiner. A tutorial on hidden markov models and selected applications in
speech recognition. In Proc. Proceeding of IEEE, pages 257286, 1989.
18. B. Ren, F. Robert, and et al. Genome-wide location and function of DNA binding
proteins. Science, 290(5500):23062309, 2000.
19. D. Robyr, Y. Suka, I. Xenarios, S. K. Kurdistani, A. Wang, N. Suka, and M. Grun-
stein. Microarray deacetylation maps determine genome-wide functions for yeast
histone deacetylases. Cell, 109(4):437446, 2002.
20. F. Sha and F. Pereira. Shalow parsing with conditional random elds. In Proc.
15th Proceeding of Human Language Technology, 2003.
21. H. Wallach. Ecient Training of Conditional Random Fields. Master thesis, 2002.
DNA Fragment Assembly: An Ant Colony
System Approach
Abstract. This paper presents the use of an ant colony system (ACS)
algorithm in DNA fragment assembly. The assembly problem generally
arises during the sequencing of large strands of DNA where the strands
are needed to be shotgun-replicated and broken into fragments that are
small enough for sequencing. The assembly problem can thus be classi-
ed as a combinatorial optimisation problem where the aim is to nd
the right order of each fragment in the ordering sequence that leads
to the formation of a consensus sequence that truly reects the origi-
nal DNA strands. The assembly procedure proposed is composed of two
stages: fragment assembly and contiguous sequence (contig) assembly.
In the fragment assembly stage, a possible alignment between fragments
is determined with the use of a Smith-Waterman algorithm where the
fragment ordering sequence is created using the ACS algorithm. The re-
sulting contigs are then assembled together using a nearest neighbour
heuristic (NNH) rule. The results indicate that in overall the perfor-
mance of the combined ACS/NNH technique is superior to that of the
NNH search and a CAP3 program. The results also reveal that the solu-
tions produced by the CAP3 program contain a higher number of contigs
than the solutions produced by the proposed technique. In addition, the
quality of the combined ACS/NNH solutions is higher than that of the
CAP3 solutions when the problem size is large.
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 231242, 2006.
c Springer-Verlag Berlin Heidelberg 2006
232 W. Wetcharaporn, N. Chaiyaratana, and S. Tongsima
covering over three billion genetic codes, is investigated. To achieve the goal, the
project has to be divided into many components. One of major components is
DNA fragment assembly. DNA is a double helix comprised of two complementary
strands of polynucleotides. Each strand of DNA can be viewed as a character
string over an alphabet of four letters: A, G, C and T. The four letters represent
four bases, which are adenine (A), guanine (G), cytosine (C) and thymine (T).
The two strands are complementary in the sense that at corresponding positions
As are always paired with Ts and Cs with Gs. These pairs of complementary
bases are referred to as base pairs. At present, strands of DNA that are longer
than 600 base pairs cannot routinely be sequenced accurately [2]. The sequenc-
ing technique that involves fragmentation of a DNA strand is called a shotgun
sequencing technique. Basically, DNA is rst replicated many times and then in-
dividual strands of the double helix are broken randomly into smaller fragments.
This produces a set of out of order fragments short enough for sequencing.
A DNA fragment assembly problem involves nding the right order of each
fragment in the fragment ordering sequence, which leads to the formation of a
consensus sequence that truly reects the parent DNA strands. In other words,
the DNA fragment assembly problem can be treated as a combinatorial opti-
misation problem. A number of deterministic and stochastic search techniques
have been used to solve DNA fragment assembly problems [3]. For instance,
Huang and Madan [4] and Green [5] have used a greedy search algorithm to
solve the problem. However, a manual manipulation on the computer-generated
result is required to obtain a biologically plausible nal result. Other determin-
istic search algorithms that have been investigated include a branch-and-cut
algorithm [6] and a graph-theoretic algorithm where DNA fragments are either
represented by graph nodes [7, 8] or graph edges [9]. The capability of stochas-
tic search algorithms such as a simulated annealing algorithm [10], a genetic
algorithm [11, 12, 13] and a neural network based prediction technique [14] has
also been investigated. The best DNA fragment assembly results obtained from
stochastic searches have been reported in Parsons and Johnson [12], and Kim and
Mohan [13] where genetic algorithms have proven to outperform greedy search
techniques in relatively small-sized problems. In addition, the need for manual
intervention is also eliminated in this case. Although a signicant improvement
over the greedy search result has been achieved, the search eciency could be
further improved if the redundancy in the solution representation could be elim-
inated from the search algorithms [11]. Similar to a number of combinatorial
optimisation techniques, the use of a permutation representation is required to
represent a DNA fragment ordering solution in a genetic algorithm search. With
such representation, dierent ordering solutions can produce the same DNA con-
sensus sequence. Due to the nature of a genetic algorithm as a parallel search
technique, the representation redundancy mentioned would inevitably reduce
the algorithm eciency. A stochastic search algorithm that does not suer from
such eect is an ant colony system (ACS) algorithm [15].
The natural metaphor on which ant algorithms are based is that of ant
colonies. Real ants are capable of nding the shortest path between a food
DNA Fragment Assembly: An Ant Colony System Approach 233
source and their nest without using visual clues by exploiting pheromone in-
formation. While walking, ants deposit pheromone on the ground, and proba-
bilistically follow pheromone previously deposited by other ants. The way ants
exploit pheromone to nd the shortest path between two points can be described
as follows. Consider a situation where ants arrive at a decision point in which
they have to decide between two possible paths for both getting to and returning
from their destination. Since they have no clue about which is the best choice,
they have to pick the path randomly. It can be expected that on average half
of the ants will decide to go on one path and the rest choose to travel on the
other path. Suppose that all ants walk at approximately the same speed, the
pheromone deposited will accumulate faster on the shorter path. After a short
transitory period the dierence between the amounts of pheromone on the two
paths is suciently large so as to inuence the decision of other ants arriving at
the decision point. New ants will thus prefer to choose the shorter path since at
the decision point they perceive more pheromone. At the end, all ants will use
the shorter path. If the ants have to complete a circular tour covering n dierent
destinations without visiting order preference, the emerged shortest path will be
a solution to the n-city travelling salesman problem (TSP). Although the ACS
algorithm also exploits stochastic parallel search mechanisms, the algorithm per-
formance does not depend upon the solution representation. This is because the
optimal solution found by the ACS algorithm will emerge as a single shortest
path. In other words, the problem regarding the redundancy in the solution
representation mentioned early would be completely irrelevant to the context of
the ACS algorithm search.
The organisation of this paper is as follows. In section 2, the overview of a
DNA fragment assembly problem will be given. In section 3, the background on
the ACS algorithm will be discussed. The application of the ACS algorithm on
the DNA fragment assembly problem will be explained in section 4. Next, the
case studies are explained in section 5. The results obtained after applying the
ACS algorithm to the problem and the result discussions are given in section 6.
Finally, the conclusions are drawn in section 7.
Parent Strands
TTAGCACAGGAACTCTA
|||||||||||||||||
AATCGTGTCCTTGAGAT
sion that leads to the construction of the shortest global tour using pheromone
deposition, an optimal solution to the TSP would be co-operatively created by
all ants in the colony.
Based on the overview of the algorithm, three main components that make up
the algorithm are a state transition rule, a local pheromone-updating rule and
a global pheromone-updating rule. In addition, Dorigo and Gambardella [15]
have also introduced the use of a data set called a candidate list that creates a
limitation on the city chosen by an ant for a state transition. The explanation
on these three rules and the candidate list can be found in Dorigo and Gam-
bardella [15]. Although the explanation of the ACS algorithm is given in the
context of a TSP, the ACS algorithm can be easily applied to a DNA fragment
assembly problem. This is because the overlap score, which provides information
regarding how well two fragments can t together, can be directly viewed as the
inverse of the distance between two cities.
ordering sequence are the locations where the overlap score between two consec-
utive fragments is lower than a threshold value, which in this investigation is set
to 95. However, more than one solution generated may have the same number
of contigs. If this is the case, the solution that is the better solution is the one
where the dierence between the length of the longest and the shortest contigs
is minimal. This part of the objective function is derived from the desire that
the ultimate goal solution is the one with either only one contig or the fewest
possible number of contigs where each contig is reasonably long.
5 Case Studies
The capability of the ACS algorithm in DNA fragment assembly will be tested
using data sets obtained from a GenBank database at the National Center
for Biotechnology Information or NCBI (http://www.ncbi.nlm.nih.gov). The
parent DNA strands in this case are extracted from the human chromosome 3
where the strands with the sequence length ranging from 21K to 83K base pairs
are utilised. It is noted that each complementary pair of parent DNA strands can
be referred from the database using its accession number. The fragments used
to construct the consensus sequence are also obtained from the same database
where each fragment is unclipped (low quality base reads are retained) and has
the total number of bases between 700 and 900. This means that the fragments
used in the case studies contain sequencing errors generally found in any ex-
periments. In addition, no base quality information is used during the assembly
investigation. In order for a consensus sequence to accurately represent the par-
ent DNA strands, there must be more than one fragment covering any base pairs
238 W. Wetcharaporn, N. Chaiyaratana, and S. Tongsima
of the parent strands. The average number of fragments covering each base pair
on the parent strands is generally referred to as coverage. In addition, for a con-
sensus sequence to be made up from one contig, there must be no gaps in the
fragment ordering sequence. In this paper, the data set is prepared such that the
consensus sequence contains either one contig or multiple contigs. The summary
of the data set descriptions is given in Table 2.
In order to benchmark the performance of the ACS algorithm, its search per-
formance will be compared with that of the NNH rule and a CAP3 program [4],
which is one of the most widely used programs in bioinformatics research com-
munity. Since the base quality information is not used during the assembly, the
solutions produced by the CAP3 program would be similar to the results from
a PHRAP program [5], which is also a standard program [13]. The parameter
setting for the ACS algorithm is the recommended setting for solving symmetric
travelling salesman problems given in Dorigo and Gambardella [15].
Table 3. Number of contigs from the solutions produced by the NNH+NNH approach,
the ACS+NNH approach and the CAP3 program
7 Conclusions
In this paper, a DNA fragment assembly problem, which is a complex combina-
torial optimisation problem that can be treated as a travelling salesman problem
(TSP), has been discussed. In the context of a TSP, a fragment ordering sequence
would represent a tour that covers all cities while the overlap score between two
aligned fragments in the ordering sequence can be viewed as the inverse of the
DNA Fragment Assembly: An Ant Colony System Approach 241
Table 4. Assembly errors expressed in terms of the sum of substitution and inser-
tion/deletion errors, and the coverage error
distance between two cities. The assembly procedure proposed consists of two
stages: fragment assembly and contig assembly stages. In the fragment assembly
stage, a search for the best alignment between fragments is carried out using an
ant colony system (ACS) algorithm. The resulting contigs are then assembled
together using a nearest neighbour heuristic (NNH) rule in the contig assembly
stage. The assembly procedure proposed has been benchmarked against a CAP3
program [4]. The results suggest that the solutions produced by the CAP3 pro-
gram contain a higher number of contigs than the solutions generated by the
proposed technique. In addition, the quality of the combined ACS/NNH solu-
tions is higher than that of the CAP3 solutions when the problem size is large.
Since the core algorithm of the CAP3 program is a greedy search algorithm,
a replacement of the core algorithm with the ACS algorithm may yield an im-
provement on the nal assembly solution.
Acknowledgements
This work was supported by the Thailand Toray Science Foundation (TTSF)
and National Science and Technology Development Agency (NSTDA) under the
Thailand Graduate Institute of Science and Technology (TGIST) programme.
The authors acknowledge Prof. Xiaogiu Huang at the Iowa State University for
providing an access to the SIM and CAP3 programs.
References
1. International Human Genome Sequencing Consortium: Initial sequencing and anal-
ysis of the human genome. Nature 409(6822) (2001) 860921
2. Applewhite, A.: Mining the genome. IEEE Spectrum 39(4) (2002) 6971
242 W. Wetcharaporn, N. Chaiyaratana, and S. Tongsima
3. Pop, M., Salzberg, S.L., Shumway, M.: Genome sequence assembly: Algorithms
and issues. Computer 35(7) (2002) 4754
4. Huang, X., Madan, A.: CAP3: A DNA sequence assembly program. Genome Re-
search 9(9) (1999) 868877
5. Green, P.: Phrap documentation. Phred, Phrap, and Consed www.phrap.org
(2004)
6. Ferreira, C.E., de Souza, C.C., Wakabayashi, Y.: Rearrangement of DNA frag-
ments: A branch-and-cut algorithm. Discrete Applied Mathematics 116(1-2)
(2002) 161177
7. Batzoglou, S., Jae, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger,
B., Mesirov, J.P., Lander, E.S.: ARACHNE: A whole-genome shotgun assembler.
Genome Research 12(1) (2002) 177189
8. Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence as-
sembly. Algorithmica 13(1-2) (1995) 751
9. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA
fragment assembly. Proceedings of the National Academy of Sciences of the United
States of America 98(17) (2001) 97489753
10. Burks, C., Engle, M., Forrest, S., Parsons, R., Soderlund, C., Stolorz, P.: Stochastic
optimization tools for genomic sequence assembly. In: Adams, M.D., Fields, C.,
Venter, J.C. (eds.): Automated DNA Sequencing and Analysis. Academic Press,
London, UK (1994) 249259
11. Parsons, R.J., Forrest, S., Burks, C.: Genetic algorithms, operators, and DNA
fragment assembly. Machine Learning 21(1-2) (1995) 1133
12. Parsons, R.J., Johnson, M.E.: A case study in experimental design applied to
genetic algorithms with applications to DNA sequence assembly. American Journal
of Mathematical and Management Sciences 17(3-4) (1997) 369396
13. Kim, K., Mohan, C.K.: Parallel hierarchical adaptive genetic algorithm for frag-
ment assembly. In: Proceedings of the 2003 Congress on Evolutionary Computa-
tion, Canberra, Australia (2003) 600607
14. Angeleri, E., Apolloni, B., de Falco, D., Grandi, L.: DNA fragment assembly using
neural prediction techniques. International Journal of Neural Systems 9(6) (1999)
523544
15. Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning ap-
proach to the traveling salesman problem. IEEE Transactions on Evolutionary
Computation 1(1) (1997) 5366
16. Allex, C.F., Baldwin, S.F., Shavlik, J.W., Blattner, F.R.: Improving the quality
of automatic DNA sequence assembly using uorescent trace-data classications.
In: Proceedings of the Fourth International Conference on Intelligent Systems for
Molecular Biology, St. Louis, MO (1996) 314
17. Smith, T.F., Waterman, M.S.: Identication of common molecular subsequences.
Journal of Molecular Biology 147(1) (1981) 195197
18. Huang, X., Miller, W.: A time-ecient, linear-space local similarity algorithm.
Advances in Applied Mathematics 12(3) (1991) 337357
BeeHiveGuard: A Step Towards Secure Nature
Inspired Routing Algorithms
Abstract. Nature inspired routing protocols for xed and mobile net-
works are becoming an active area of research. However, analyzing their
security threats and countering them have received little attention. In
this paper we discuss the security threats of a state-of-the-art rout-
ing protocol, BeeHive, and then extend the algorithm with our security
model to counter them. We further conclude from our extensive exper-
iments that standard cryptography techniques can not be utilized due
to their large processing and communication costs, if Nature inspired
routing protocols are to be deployed in real world networks.
1 Introduction
Nature inspired routing protocols are becoming an active area of research because
they do not require an a priori global system model of the network rather they
utilize a local system model as observed by the agents. The agents gather the
network state in a decentralized fashion and leave the corresponding information
on visited nodes. This information enables them to make routing decisions in a
decentralized fashion without the need of having access to complete network
topology. The algorithms can adapt autonomously to changes in the network, or
in trac patterns. AntNet [1], BeeHive [15] and Distributed Genetic Algorithm
(DGA) [7] are state-of-the-art Nature inspired routing algorithms.
In all of the above-mentioned algorithms, the authors always implicitly trusted
the identity of the agents and their routing information. However, this assump-
tion is not valid for real world networks, where malicious intruders or compro-
mised nodes can wreak havoc. To our knowledge, little attention has been paid
to analyzing the security threats of Nature inspired routing protocols and e-
ciently countering them. The router vendors are not willing to deploy Nature
inspired routing protocols in real networks because their security threats are not
properly investigated. We believe that a scalable security framework, which has
acceptable processing and communication costs, is an important step toward
deployment of such protocols in real world routers. This observation provided
the motivation for our current work in which we try to take the rst step in this
direction by doing a comprehensive analysis of the security threats of the Bee-
Hive algorithm. We provide the important features of our security framework,
BeeHiveGuard, and measure its processing and communication costs.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 243254, 2006.
c Springer-Verlag Berlin Heidelberg 2006
244 H.F. Wedde, C. Timm, and M. Farooq
Tampering attack. In this attack, a malicious node simply modies the rout-
ing information carried by an agent to its own benets.
Identity impersonating or spoofing. In this attack, a router impersonates
another router by launching bogus agents. As a result, the malicious
router can force data packets not to follow a path over another router
or it can divert them towards itself.
Detour attack. An attacker forces its neighbors to route all their network
trac over it.
Related Work. In [8, 11, 13, 5, 4], the authors have developed techniques to
counter some of the above-mentioned security threats in classical routing algo-
rithms. They utilized standard cryptography techniques i.e. digital signatures or
Hashed Message Authentication Code (HMAC) to avert fabrication and tam-
pering attacks. In these techniques, a router veries that the originator of a
control message is the node that is indicated in the header. In [11], the authors
have secured distance vector routing protocols by incorporating the information
about a node and its predecessor node in the control packet. Sequence numbers
are used to identify an old or obsolete control packet. However, none of these
approaches try to analyze and counter the security threats related to the specic
features of Nature inspired routing algorithms with the exception of the prelim-
inary work of Zhong and Evans [16]. They studied the anomalous behavior of
AntNet [1] under three types of attacks: fabrication, dropping and tampering.
Their experiments clearly demonstrate that the malicious nodes can disrupt the
normal routing behavior of AntNet by launching these attacks.
3 BeeHive Algorithm
This algorithm was proposed by Wedde, Farooq and Zhang in [15]. The algorithm
is inspired by the communication language of honey bees. Each node periodically
sends a bee agent by broadcasting the replicas of it to each neighbor. The replicas
explore the network using priority queues and they use an estimation model
to estimate the propagation and queuing delay from a node, where they are
received, to their launching node. Once the replicas of the same agent arrive at
a node via dierent neighbors of the node, they exchange routing information to
model the network state at this node. Through this exchange of information by
the replicas at a node, the node is able to maintain a quality metric for reaching
destinations via its neighbors. The algorithm utilizes just forward moving agents
and no statistical parameters are stored in the routing tables. In BeeHive a
network is divided into Foraging Regions and Foraging Zones. Each node belongs
to only one Foraging Region. Each Foraging Region has a representative node.
A Foraging Zone of a node consists of all the nodes from which a replica of an
agent could reach this node in 7 hops. This approach signicantly reduces the size
of the routing table because each node maintains detailed routing information
only about reaching the nodes within its Foraging Zone and for reaching the
representative nodes of the Foraging Regions. In this way, a data packet, whose
246 H.F. Wedde, C. Timm, and M. Farooq
4 BeeHiveGuard
In our work, the basic motivation is to analyze the security threats of deploying
Nature inspired routing protocols and then to design and develop a comprehen-
sive security framework that can counter these threats. As a rst step, we take
the BeeHive algorithm, introduced in the previous section, and extend it with
our security model that can counter these threats. We applied standard RSA
algorithm [10] for doing cryptography functions like encryption and decryption.
We name the new algorithm as BeeHiveGuard and discuss its relevant security
features. We assume that a secure key distribution infrastructure exists in the
network.
Agent integrity. The purpose of this extension is that the management infor-
mation related to a bee agent can not be modied/impersonated by an interme-
diate router. The values of relevant information elds of a bee agent that must
be protected are: its identier and identier of its replicas, its source address, its
time to live timer (TTL) and the address of its Foraging Region. The source node
signs these elds with its private key and puts the corresponding signature sig1
in the bee agent. If a traitorous router tries to change these elds or impersonate
someone else then other nodes can easily detect and discard the corresponding
bogus bee agents.
sig2=sign(p0,q0)
sig3=sign(p1',q1')
0
sig2=sign(p1',q1')
sig3=sign(0,0)
1 4
sig2=sign(p1,q1)
sig3=sign(0,0)
2 3
sig2=sign(p2,q2) sig2=sign(p3,q3)
sig3=sign(p1,q1) sig3=sign(p2,q2)
4. Then Node 4 estimates its delays to Node 1 by adding its delay values with
the ones of Node 3 and Node 2. As a result, Node 3 can only manipulate its
own delays but not the cumulative delays from Node 1 to Node 3. Moreover, a
node also compares the delay values in sig2 with the ones in sig3. If the delay
values in sig2 are lesser or equal to the ones in sig3 then the predecessor node has
provided fake delay values. As a result, sig2 value at this node is calculated with
the help of the delay values in sig3 and the bee agent continues its exploration.
Since a node utilizes the information of its predecessor node and the predecessor
node of its predecessor node, therefore, a predecessor node can not signicantly
inuence the routing behavior by faking its own routing information above.
5 Experiments
We designed a series of experiments, which simulate dierent types of threats in
the networks. The results of the experiments clearly demonstrate that BeeHive
algorithm is susceptible to a number of such attacks. We utilized a standard
cryptography library, OpenSSL [2], to implement our security model in the Bee-
HiveGuard algorithm. The library supports relevant cryptography techniques
like digital signatures, symmetric and asymmetric cryptography, and cryptog-
raphy hash functions. A proling framework, which measures the processing
complexity of a function in cycles, is incorporated in the performance evaluation
framework presented in [14]. The empirical validation of our security model is
necessary because it is not a trivial task to formally model the emergent behavior
of Nature inspired routing protocols. BeeHiveGuard is realized in OMNeT++
simulator [12]. The experiments were conducted on Fujitsu Siemens PC, which
has a Pentium 4 3.0 GHz processor and 1 Gigabyte of RAM. The reported values
are an average of the values obtained from ten independent runs.
1 4
2 3
Fig. 2. Net1: Node 3 is tampering the queuing and the propagation delay
0 1 2 3
1 3
0 5
2 4
6 Results
Tampering control messages. In Figure 5, one can see that in a normal mode
the path 4-0-1 is rated higher than the path 4-3-2-1 because of its smaller delays.
As a result, more data packets are routed on the path 4-0-1 as compared with the
path 4-3-2-1. However, the situation is drastically changed once Node 3 launches
its attack at 300 seconds by tampering the information in the bee agents (see
Figure 6). The impact of the attack is signicantly reduced in BeeHiveGuard
120000 1
packets delivered over a certain neighbor
80000
0.6
60000
0.4
40000
0.2
20000
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
120000 1
packets delivered over a certain neighbor
100000
0.8
80000
0.6
60000
0.4
40000
0.2
20000
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
120000 1
packets delivered over a certain neighbor
100000
0.8
80000
0.6
60000
0.4
40000
0.2
20000
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
(see Figure 7) because Node 3 can now just manipulate its own queuing and
propagation delays. Remember in BeeHive it can manipulate the delays of the
complete path 3-2-1.
200000
packets delivered over a certain neighbor
150000
data packets over neighbor 1 0.8 goodness for neighbor 1
100000 0.6
0.4
50000
data packets over neighbor 4 0.2 goodness for neighbor 4
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
120000
packets delivered over a certain neighbor
100000 1
80000 0.8
60000 0.6
40000 0.4
20000 0.2
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
200000
packets delivered over a certain neighbor
150000
data packets over neighbor 1 0.8 goodness for neighbor 1
100000 0.6
0.4
50000
data packets over neighbor 4 0.2 goodness for neighbor 4
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
the number of packets that followed either the path 3-2-1 or 3-2-4. Consequently,
the number of data packets that followed the path 3-2-4-2-1 is not counted for
neighbor 1. Figure 10 shows that in BeeHiveGuard Node 4 is not able to inuence
the routing decisions by propagating its bogus bee agents.
1
packets delivered over a certain neighbor
0.6
60000
0.4
40000
20000 0.2
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
120000 1
packets delivered over a certain neighbor
100000
0.8
80000
0.6
60000
0.4
40000
0.2
20000
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
1
packets delivered over a certain neighbor
0.8
80000
0.6
60000
0.4
40000
20000 0.2
0 0
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
time (sec) time (sec)
1 launches its attack by retransmitting the bee agents of Node 0 with modied
agent and replica ids then the quality of the paths 5-4-2-* does not improve. As
a result, Node 1 is able to divert the network trac towards itself through the
path 5-3-1. However, the impact in this attack is less signicant as compared
with the previous ones (see Figure 12). Figure 13 shows that BeeHiveGuard has
successfully countered even these attacks because Node 1 could not modify the
bee agents launched by Node 0.
7 Costs of BeeHiveGuard
References
1 Introduction
Mobile Ad-hoc Networks (MANETs) are uctuating networks populated by a set
of communicating devices called stations (they are also called terminals) which
can spontaneously interconnect each other without a pre-existing infrastructure.
This means that no carrier is present in such networks as it is usual in many
other types of communication networks. Stations in MANETs are usually lap-
tops, PDAs, or mobile phones, equipped with network cards featuring wireless
technologies such as Bluetooth and/or IEEE802.11 (WiFi). In this scenario, a)
stations communicate within a limited range, and b) stations can move while
communicating. A consequence of mobility is that the topology of such networks
may change quickly and in unpredictable ways. This dynamical behavior consti-
tutes one of the main obstacles for performing ecient communications on such
networks.
Broadcasting is a common operation at the application level and is also widely
used for solving many network layer problems being, for example, the basis
This work has been partially funded by the Ministry of Science and Technology and
FEDER under contract TIN2005-08818-C04-01 (the OPLINK project).
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 255266, 2006.
c Springer-Verlag Berlin Heidelberg 2006
256 F. Luna et al.
mechanism for many routing protocols. In a given MANET, due to host mobility,
broadcasting is expected to be performed very frequently (e.g., for paging a
particular host, sending an alarm signal, and/or nding a route to a given target
terminal). Broadcasting may also serve as a last resort to provide multicast
services in networks with such rapidly changing topologies and stems for the
organization of terminals in groups. Hence, having a well-tuned broadcasting
strategy will result in a major impact in network performance.
In this paper we are considering the problem of broadcasting on a particular
sub-class of MANETs called Metropolitan MANETs, which cover from shopping
malls to metropolitan areas. Instead of providing a generic protocol performing
well on average situations, our proposal consists of optimally tuning the broad-
casting service for a set of networks and for a particular category of broadcast
messages. Optimizing a broadcasting strategy is a multiobjective problem where
multiple functions have to be satised at the same time: maximizing the number
of stations reached, minimizing the network use, and minimizing the makespan
are three examples of the potential objectives. In this work, the broadcasting
strategy considered for optimization is DFCN [1], and the target networks are
metropolitan MANETs. Since manipulating such networks is dicult, we must
rely on software simulators for evaluating the scenarios from the designer point-
of-view.
Contrary to single objective optimization, multiobjective optimization is not
restricted to nd a unique solution of a given multiobjective problem, but a set
of solutions known as the Pareto optimal set. For instance, taking as an example
the problem we are dealing with, one solution can represent the best result
concerning the number of reached stations, while another solution could be the
best one concerning the makespan. These solutions are said to be nondominated.
The result provided by a multiobjective optimization algorithm is then a set of
nondominated solutions (the Pareto optima) which are collectively known as the
Pareto front when plotted in the objective space. The mission of the decision
maker is to choose the most adequate solution from the Pareto front.
This multiobjective problem of broadcasting in MANETs, which has been
previously addressed with a cellular genetic algorithm (cMOGA) in [2], is now
tackled with a state-of-the-art multiobjective scatter search algorithm called
AbSS (Archive-based Scatter Search) [3]. Scatter search [4, 5, 6] has been suc-
cessfully applied to a wide variety of optimization problems [5], but it has not
been extended to deal with MOPs until recently [3, 7, 8, 9]. This metaheuristic
technique starts from an initial set of diverse solutions from which a subset,
known as the reference set (RefSet ), is built by including both high quality
solutions and highly diverse solutions. Then, an iterative procedure systemati-
cally combines the solutions in RefSet somehow for generating new (hopefully
better) solutions that may be used for updating the reference set and even the
initial population. After that, an iterative procedure is used to locate an optimal
solution.
The contributions of this work are summarized in the following. Firstly, we
solve the broadcasting problem on MANETs using a multiobjective scatter
Optimal Broadcasting in Metropolitan MANETs 257
search, and compare the results with those obtained with cMOGA. Secondly,
we are dealing in this work with a more realistic problem than the one faced
in [2] because we are using an interesting real world scenario (a shopping mall)
never tackled before.
The rest of the paper is structured as follows. In the next section, we detail
the multiobjective problem of broadcasting in MANETs. Section 3 includes the
description of the multiobjective scatter search algorithm. Metrics, parameter-
ization, and results are presented in Sect. 4. Finally, conclusions and lines of
future work are given in Sect. 5.
2 Problem Definition
The problem we study in this paper consists of, given an input MANET, deter-
mining the most adequate parameters for a broadcasting strategy in it. We rst
describe in Sect. 2.1 the target networks we have used. Section 2.2 is devoted to
the presentation of DFCN, the broadcasting strategy to be tuned. Finally, the
MOP we dene for this work is presented in Sect. 2.3.
1
http://www-lih.univ-lehavre.fr/hogie/madhoc/
258 F. Luna et al.
Observation Window
Outside
Inside
(a) (b)
Fig. 1. (a) Metropolitan MANET, and (b) the eect of the observation window
where saf eDensity is the maximum safe density below which DFCN always
rebroadcasts and minGain is the minimum gain for rebroadcasting, i.e., the
ratio between the number of neighbors which have not received the message and
the total number of neighbors.
260 F. Luna et al.
Each time a station s gets a new neighbor, the RAD for all messages is set
to zero and, therefore, messages are immediately candidate to emission. If N (s)
is greater than a given threshold, which we have called proD, this behavior is
disabled, so no action is undertaken on new neighbor discovery. proD is used for
avoiding massive packet rebroadcasting when a new station appears in highly
dense areas, that is, avoiding network congestions on the proactive behavior.
minGain is the minimum gain for rebroadcasting. This is the most impor-
tant parameter for tuning DFCN, since minimizing the bandwidth should
be highly dependent on the network density. It ranges from 0.0 to 1.0.
[lowerBoundRAD,upperBoundRAD] denes the RAD value (random de-
lay for rebroadcasting in milliseconds). Both parameters take values in the
interval [0.0, 10.0] milliseconds.
proD is the maximal density (proD [0, 100]) for which it is still needed using
proactive behavior (i.e., reacting on new neighbors) for complementing the
reactive behavior.
safeDensity denes a maximum safe density of the threshold which ranges
from 0 to 100 devices.
Diversification P
Generation Method
Improvement
Method
Initialization phase
Subset Generation
Method
The scatter search technique starts by creating an initial set of diverse indi-
viduals in the initialization phase. This phase consists of iteratively generating
new solutions by invoking the diversication generation method; each solution
is passed to the improvement method, which usually applies a local search pro-
cedure in an iterative manner, and the resulting individual is included into the
initial set P. After the initial phase, the scatter search main loop starts.
The main loop begins building the reference set from the initial set by invok-
ing the reference set update method. The reference set is a collection of both
high quality solutions and diverse solutions that are used for generating new
individuals. Solutions in this set are systematically grouped into subsets of two
or more individuals by means of the subset generation method. In the next step,
solutions in each subset are combined to create a new individual, according to
the solution combination method. Then, the improvement method is applied to
every new individual. The nal step consists of deciding whether the resulting
solution is inserted into the reference set or not. This loop is executed until a
termination condition is met (for example, a given number of iterations has been
performed, or the subset generation method does not produce new subsets).
Optionally, there is a re-start process invoked when the subset generation
method does not produce new subsets of solutions. The idea is to obtain a new
initial set, which will now include the current individuals in the reference set.
The rest of individuals is generated by using the diversication generation and
improvement methods, as in the initial phase.
3.2 AbSS
AbSS (Archive-based Scatter Search) [3] is based on the aforementioned scatter
search template and its application to solve bounded continuous single objective
optimization problems [6]. It uses an external archive for storing nondominated
solutions and combines ideas of three state-of-the-art evolutionary algorithms
for solving MOPs. In concrete, the archive management follows the scheme of
262 F. Luna et al.
PAES [11], but using the crowding distance of NSGA-II [12] as a niching measure
instead of the PAES adaptive grid; additionally, the density estimation found in
SPEA2 [13] is adopted for selecting the solutions from the initial set that will
build the reference set. Once described the overall view of the technique, we now
detail the ve methods to engineer AbSS:
4 Experiments
This section is devoted to presenting the experiments performed for this work.
We rst describe the metrics used for measuring the performance of the result-
ing Pareto fronts. Next, the parameterization of AbSS and Madhoc is detailed.
Finally, we show the results for DFCNT and compare them against cMOGA [2].
4.1 Metrics
We have used three metrics for assessing the performance of both AbSS and
cMOGA: the number of Pareto optima that the optimizers are able to nd, Set
Optimal Broadcasting in Metropolitan MANETs 263
4.2 Parameterization
As we stated in Sect. 2.1, the behavior of Madhoc has been dened based on three
parameters mainly: the size of the simulation area, the density of mobile stations,
and the type of environment. For our experiments, we have used a simulation
area of 40,000 square meters, a density of 2,000 stations per square kilometer,
and, from the available environments of Madhoc, the mall environment has been
used. This environment is intended to model a commercial shopping center, in
which stores are usually located together one each other in corridors. People go
from one store to another by these corridors, occasionally stopping for looking at
some shopwindows. Both the mobility of devices and their signal propagation are
restricted due to the walls of the building. A metropolitan MANET with such a
conguration has been shown in Fig. 1. Due to the stochastic nature of Madhoc,
ve simulations (i.e., ve dierent network instances) per function evaluation
have been performed so that the tness values of the functions are computed as
the average resulting values of these ve dierent network instances.
The conguration used for cMOGA is the same as that used in [2]: a popula-
tion of 100 individuals arranged in a 10 10 square toroidal grid, the neighbor-
hood is NEWS, binary tournament selection, simulated binary crossover (SBX)
264 F. Luna et al.
Table 1. Performance metrics for AbSS and cMOGA when solving DFCNT
AbSS cMOGA
Metric average std average std t-test
Number of Pareto Optima 98.7586 2.8119 98.1053 2.9000
Set Coverage 0.9865 0.0103 0.9793 0.0076 +
Hypervolume 0.8989 0.0695 0.8199 0.0854 +
4.3 Results
Let us now begin with the analysis of the results, which are presented in Table 1.
Since both AbSS and cMOGA are stochastic algorithms and we want to provide
the results with statistical condence, 30 independent runs of each multiobjective
optimizer have been performed, as well as t-tests at 95% of signicance level (last
column of Table 1). The t-test assesses whether the means of two samples are
statistically dierent from each other.
If we consider that the two algorithms are congured for obtaining 100 non-
dominated solutions at most (maximum archive size), values shown in Table 1
point out that most executions of the optimizers ll up the whole archive. Though
AbSS returns a slightly higher number of Pareto optima on average than cMOGA
does, the dierence is negligible and no statistical condence exists ( symbol
in t-test column), thus showing that both optimizers have a similar ability for
exploring the search space of DFCNT.
As regards to the Set Coverage metric, we want to clarify that results shown
in column AbSS correspond to C(AbSS, cM OGA) whereas those presented in
column cMOGA are C(cM OGA, AbSS). As it can be seen in Table 1, AbSS
gets larger values for this metric than cMOGA and there exists statistical con-
dence for this claim (see + symbol in the last column). This fact points out
that AbSS can nd solutions that dominate more solutions of cMOGA than vice
versa. However, Set Coverage values are similar in both the cases, what indi-
cates that each algorithm computes high quality solutions that dominate most
solutions of the other, but those high quality solutions are in turn nondominated.
Last row in Table 1 presents the results of the Hypervolume metric. They
clearly show now that AbSS overcomes cMOGA when considering at the same
Optimal Broadcasting in Metropolitan MANETs 265
AbSS
cMOGA
Makespan
8
7
6
5
4
3
2
1
0.9
35
30 0.8
25 0.7
20 0.6
15 0.5 Coverage
Number of Transmissions 10 0.4
5 0.3
0 0.2
time both convergence and diversity in the resulting Pareto fronts (all this sup-
ported with statistical condence). Since the Set Coverage metric showed that
both optimizers were similar in terms of convergence, we can conclude that AbSS
is reaching this Hypervolume value because of the diversity in the found Pareto
front. That is, the set of nondominated solutions computed by AbSS covers a
larger region of the objective space, what is an important feature for actual de-
signs of MANETs. We show an example Pareto front that capture the previous
claims in Fig. 3. Regarding coverage, the AbSS front (+ symbols) is behind
(on the right) cMOGA solutions ( symbols). With respect to diversity, it
also can be seen that there are nondominated solutions from AbSS that reach
DFCN congurations where message coverage is around 40% of the stations
while cMOGA is not able to get solutions in this region of the objective space.
Therefore, using AbSS provides the network designer (decision maker) with a
wider set of DFCN parameter settings which ranges from congurations that get
a high coverage in a short makespan but using a high bandwidth to those cheap
solutions in terms of time and bandwidth being suitable if coverage is not a hard
constraint in the network.
tions from the scatter search approach dominated those obtained with cMOGA
(convergence) as well as covered a larger region of the objective space (diver-
sity). From these results, a clear conclusion can be drawn: AbSS is a promising
approach for solving DFCNT with advantages over the existing one.
As a future work, we plan to perform more in depth analysis on using AbSS
for solving real world MOPs. On the one hand, we also intend to use dierent
scenarios where DFCN has to be tuned and, on the other hand, enlarge the
simulation area to a still larger metropolitan network for large cities.
References
1. Hogie, L., Guinand, F., Bouvry, P.: A Heuristic for Ecient Broadcasting in the
Metropolitan Ad Hoc Network. In: 8th Int. Conf. on Knowledge-Based Intelligent
Information and Engineering Systems. (2004) 727733
2. Alba, E., Dorronsoro, B., Luna, F., Nebro, A., Bouvry, P.: A Cellular Multi-
Objective Genetic Algorithm for Optimal Broadcasting Strategy in Metropolitan
MANETs. In: IPDPS-NIDISC05. (2005) 192
3. Nebro, A.J., Luna, F., Dorronsoro, B., Alba, E., Beham, A.: AbSS: An Archive-
based Scatter Search Algorithm for Multiobjective Optimization. European Jour-
nal of Operational Research (2005) Submitted
4. Glover, F.: A Template for Scatter Search and Path Relinking. In: Third European
Conf. on Articial Evolution. Volume 1363 of LNCS. Springer Verlag (1997) 354
5. Glover, F., Laguna, M., Mart, R.: Fundamentals of Scatter Search and Path
Relinking. Control and Cybernetics 29 (2000) 653684
6. Glover, F., Laguna, M., Mart, R.: Scatter Search. In: Advances in Evolutionary
Computing: Theory and Applications. Springer, New York (2003) 519539
7. Beausoleil, R.P.: MOSS: Multiobjective Scatter Search Applied to Nonlinear Mul-
tiple Criteria Optimization. Eu. J. of Operational Research 169 (2005) 426449
8. da Silva, C.G., Clmaco, J., Figueira, J.: A Scatter Search Method for the Bi-
Criteria Multi-Dimensional {0,1}-Knapsack Problem using Surrogate Relaxation.
Journal of Mathematical Modelling and Algorithms 3 (2004) 183208
9. Nebro, A.J., Luna, F., Alba, E.: New Ideas in Applying Scatter Search to Multi-
objective Optimization. In: EMO 2005. LNCS 3410 (2005) 443458
10. Williams, B., Camp, T.: Comparison of Broadcasting Techniques for Mobile Ad
Hoc Networks. In: Proc. of the ACM International Symposium on Mobile Ad Hoc
Networking and Computing (MOBIHOC). (2002) 194205
11. Knowles, J., Corne, D.: The Pareto Archived Evolution Strategy: A New Baseline
Algorithm for Multiobjective Optimization. In: Proceedings of the 1999 Congress
on Evolutionary Computation, CEC. (1999) 9105
12. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective
Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6
(2002) 182197
13. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto
Evolutionary Algorithm. Technical report, Swiss Federal Inst. of Technology (2001)
14. Deb, K., Agrawal, B.: Simulated Binary Crossover for Continuous Search Space.
Complex Systems 9 (1995) 115148
15. Zitzler, E.: Evolutionary Algorithms for Multiobjective Optimization: Methods
and Applications. PhD thesis, Swiss Federal Institute of Technology (ETH) (1999)
16. Zitzler, E., Thiele, L.: Multiobjective Optimization Using Evolutionary Algorithms
A Comparative Study. In: PPSN V. (1998) 292301
Evolutionary Design of OAB and AAB Communication
Schedules for Interconnection Networks
1 Introduction
With parallel and distributed computing coming of age, multiprocessor systems are
more frequently found not only in high-end servers and workstations, but also in
small-scale parallel systems for high performance control, data acquisition and analy-
sis, image processing, networking processors, wireless communication, and game
computers. The design and optimization of hardware and software architectures for
these parallel embedded applications have been an active research area in recent
years. For many cases it is better to use several small processing nodes rather than a
single big and complex CPU. Nowadays, it is feasible to place large CPU clusters on
a single chip (multiprocessor SoCs, MSoCs), allowing both large local memories and
the high bandwidth of on-chip interconnect.
One of the greatest challenges faced by designers of digital systems is optimizing
the communication and interconnection between system components. As more and
more processor cores and other large reusable components have been integrated on
single silicon die, a need for a systematic approach to the design of communication
part has become acute. One reason is that buses, the former main means to connect
the components, could not scale to higher numbers of communication partners. Re-
cently the research opened up in Network on Chip (NoC) area, encompassing the
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 267 278, 2006.
Springer-Verlag Berlin Heidelberg 2006
268 M. Ohldal et al.
2 Models of Communications
Communications between two partners (p2p) or among all (or a subset) of partners
engaged in parallel processing have a dramatic impact on the speedup of parallel
applications. Performance modelling of p2p and group communications is therefore
important in design of application-specific systems. A p2p communication may be
random (input data dependent) as far as source-destination pair or a message length is
concerned. However, in many parallel algorithms we often find certain communica-
tion patterns, which are regular in time, in space, or in both time and space; by space
we understand spatial distribution of processes on processors. Communications taking
place among a subset or among all processors are called group or collective commu-
nications. Examples of these may serve One-to-All Broadcast (OAB), All-to-All
Broadcast (AAB), One-to-All Scatter (OAS, a private message to each partner), All-
to-One Gather (AOG), All-to-All Scatter (AAS), permutation, scan, reduction and
others [2]. Provided that the amount of computation is known, as is usually true in
case of application-specific systems, the only thing that matters in obtaining the high-
est performance are group communication times.
The simplest time model of communication uses a number of communication steps
(rounds): point-to-point communication takes one step between adjacent nodes and
a number of steps if the nodes are not directly connected.
Two types of switching are used in this article. The first one is distance-sensitive
Store-and-Forward (SF). Each intermediate node on the path firstly receives the
whole message and then sends it to adjacent node in the next possible communication
step. The second type of switching is called wormhole (WH) switching. Here several
p2p messages between source-destination pairs, not necessarily neighbours can pro-
ceed concurrently and can be combined into a single step if their paths are disjoint. Of
course, for simplicity, we assume no contention for channels and no resulting delays.
An example of these switching techniques is shown in Fig. 1.
Further, we have to distinguish between unidirectional (simplex) channels and bi-
directional (half-duplex, full-duplex) channels. The number of ports that can be en-
gaged in communication simultaneously (1-port or all-port models of routers) has also
an impact on the number of communication steps and communication time, as well as
if nodes can combine/extract partial messages with negligible overhead (combining
model) or can only retransmit/consume original messages (non-combining model).
Evolutionary Design of OAB and AAB Communication Schedules 269
We use all-port non-combining model in our experiments. The goal was to find
communication algorithms whose time complexity is as close as possible to mathe-
matically derived lower bounds on number of communication steps.
Store-and-forward
P3
P2
P1
P0
Communication step
time
Wormhole
P3
P2 P5
P1 P4
P0 P0
Communication step
time
In our experimental runs mostly the well known hypercube [16] and AMP network
[15] topologies were tested, see Fig. 2. Optimal schedules for the former topology are
known and can therefore be used to evaluate quality of used algorithms; the feature of
the latter topology (for which optimal schedules are unknown) is that the number of
nodes with degree d that can be connected in a network is maximum.
HGSA [7] is a hybrid method that uses parallel Simulated Annealing (SA) [10] with
the operations used in standard genetic algorithms [8]. In the proposed algorithm,
several SA processes run in parallel. After a number of steps (after every ten iterations
of Metropolis algorithm), the crossover is used to produce new solutions.
During communication, which is activated each 10th iteration of Metropolis algo-
rithm, all processes sends their solution to a master. The master keeps one solution for
himself and sends one randomly chosen solution to each slave. The selection is based
on the roulette wheel, where the individual with the best value of the fitness function
has the highest probability of selection.
After communication phase, all processes have two individuals. Now the phase of
genetic crossover starts. Two additional children solutions are generated from two
parent solutions using double-point crossover. The solution with the best value of the
fitness function is selected and mutation is performed: always in case of the parent
solution, otherwise with a predefined probability. Mutation is performed by randomly
selecting genes and by randomly changing their values. A new solution of each proc-
ess is selected from the actual solution provided by SA process and from the solution,
which was obtained after genetic mutation. The selection is controlled by well-known
Metropolis criterion.
0 1 2 3
4
Fig. 3. The optimal OAB schedules for 8 nodes ring topology and the relevant broadcast trees
The lower bounds on the number of communication steps for the all-port hyper-
cube and AMP topology are shown in Table 1. Parameters of the interconnection
network in Table 1 are: processors count P, network diameter D, node degree d, bi-
section width BC, and average distance da.
5 Design of Algorithms
The goal of proposed algorithms is to find a schedule of a group communication with
the number of steps as close as possible to the above lower bounds. The solution of
this optimization problem by means of evolutionary algorithms may be decomposed
into several phases. In the first phase, it is necessary to choose a suitable encoding of
272 M. Ohldal et al.
the problem into a chromosome. The second step is a definition of the fitness func-
tion, which determines quality of a chromosome. The next phase is design of the input
data structure for the evolutionary algorithm. The last phase includes experimental
runs of the evolutionary algorithm and search for the best set of its parameters. The
choice of parameters should speed-up the convergence of the algorithm and simulta-
neously minimizes a probability of getting stuck in local minima.
Different encodings were used for each optimization algorithm according to the
switching technique. We used an indirect encoding for OAB with wormhole switch-
ing optimized by SGA algorithm. Thus a chromosome does not include a decision
tree, but only instructions how to create it from chromosome. Any chromosome con-
sists of P genes. Every gene corresponds to one destination node. Individual genes
include three integer values. The first one is a source node index. The second one
determines the shortest path along which the message will by transmitted. The last
one is a communication step number when the communication will be performed.
The main disadvantage of this encoding is formation of inadmissible solutions dur-
ing process of genetic manipulation. We say that a solution is inadmissible if it is not
possible to construct correct broadcast tree from it. An example of inadmissible solu-
tion can be a case when some node receives a message in a given step from a node
that has not received the message yet. That is why admissibility verification has to be
carried out for every solution before every fitness function evaluation and if the need
be, the restoration will be accomplished. In Fig. 4, a chromosome for wormhole OAB
communication patter for the 8-node ring topology is presented.
0 1 2 3 4 5 6 7
0 0 0 0 0 2 3 0 2 0 0 1 3 0 2 6 0 2 0 0 1 0 0 2
Very simply encoding of SF OAB communication pattern has been chosen for
HGSA. Every chromosome consists of P genes, where P is a number of processors in
a given topology. The genes index represents the destination processor for a mes-
sage. The gene consists of two integer components. The first component is an index
of one of the shortest path from source to destination. The second component is a
sequence of communication links on the path. Fig. 5 illustrates an example of this
encoding. The source processor has index 0. For completeness the chromosome in-
cludes also communication from source to source processor, but this communication
is not realized. This gene is included only for the easier evaluation of the fitness
function.
Evolutionary Design of OAB and AAB Communication Schedules 273
The main advantage of this encoding is a short chromosome and the absence of in-
admissible solutions (every message is transmitted from the source to a destination).
The main disadvantage is a large number of possible values of the first gene compo-
nent. The number of the values rapidly increases with the distance from source to
destination as there are more shortest paths between them.
Destination
0 1 2 3
0 0 0 0
Source
The shortest path List of communi-
Gene
index from source to cation steps
destination
Fig. 5. The structure of chromosome of HGSA in case of OAB
The fitness function evaluation is the same for both proposed algorithms. It is based
on testing of conflict-freedom. We say that two communication paths are in conflict if
and only if they use the same communication link in the same time and in the same
direction (see Fig. 6). The fitness function is based on conflict counting. The optimal
communication schedule for the given number of communication steps must be con-
flict-free. If the conflict occurs, the schedule can not be used in real application.
Conflict-free
Conflict
This algorithm generates all shortest paths and saves them in the operating memory
into a specific data structure. The generating algorithm [6] is inspired by the breadth-
first search algorithms BFS. BFS is based on the searching a graph, where the source
processor is chosen as a root. The edges create a tree used in searching process. A tree
is gradually constructed, one level at a time, from the root that is assigned an index of
a source node. When a new level of the tree is generated, every node at the lowest
level (leaf) is expanded. When a node is expanded, its successors are determined as
all its direct neighbours except those, which are already located at higher levels of the
tree (it is necessary to avoid cycles). Construction of the tree is finished when a value
of at least one leaf is equal to the index of a destination node. Destination leaves
indices confirm the existence of searched paths, which are then stored as sequences of
incident node indices.
5.4 Heuristics
In SGA a new heuristic for chromosome restoration was used. The restoration (cor-
rection of the broadcast tree) proceeds subsequently in particular communication
steps. For every node we check if it receives the message from the node that has al-
ready received it in some previous communication step. As far as this condition is not
satisfied, the source node of this communication is randomly replaced by a node that
already has the message. Further, it is necessary to check already used shortest paths.
There is a finite number of the shortest paths from every source to every destination
node. If the second gene component (the index path) exceeds this value, the modulo
operation will be applied to this gene component.
In HGSA two heuristics are used to speed up the convergence to a sub-optimal so-
lution. They decrease the probability of being trapped in local optima during the exe-
cution. The idea is a simple reduction of the path length. The first heuristics is used
after the initialization of HGSA and then after each application of Metropolis algo-
rithm. The length of the path from the source to the destination node has some value.
If the end node occurs in another gene with a smaller length, than the length and the
path in the original gene are changed accordingly.
6 7 6 7
message 4 5
4 5
0 7
0 4
3 2 3 2
0 1
0 1
the examined path. It means really to endeavour after suspending message in the node
during the usage of the same link by another message. However, this changing must
not increase the number of the communication steps of the optimal schedule.
In the case that above presented way doesnt lead to improvement, it tries other
way and it is endeavor after the fastest sending message from source to destination.
6 Experimental Results
Both sequential SGA and parallel HGSA have been implemented in C/C++. They use
only standard C and C++ libraries to ensure good portability. HGSA implementation
uses MPI [9] routines for message passing and can therefore be compiled and run on
any architecture (clusters of workstations, MPPs, SMPs, etc.) for which an implemen-
tation of MPI standard is available.
The proposed algorithms were verified on some multiprocessor topologies (e.g.
Midimew, K-Ring...). Two topologies were examined most intensively, namely five
cases of hypercubes and five cases of AMP network topologies were used. Other
topologies were tested only in 8-node configuration.
The theoretical time complexity in terms of a minimal number of communication
steps can be derived for all examined topologies. Theoretical lower bounds of tested
topologies are shown in Table 2.
Parameters of SGA were set to the same values for all runs, i.e. probability of
crossover 70%, probability of mutation 5%. 10 runs of SGA were performed for each
topology, whereas the size of population was set on the value, in which success rate
was better than 50%.
Parameters of HGSA were set to the same values for all runs too, i.e. 10 computers
in the master slave architecture, the length of communication interval between master
and slave was each 10s iterations of Metropolis algorithm 10/10 (OAB/AAB), start
temperature 100, number of iterations in each temperature phases was 10, gradient of
cooling 0.9/0.99 (OAB/AAB). 15 runs of HGSA were performed for each topology.
We counted only the successful completions, i.e. those reaching the global opti-
mum. The success rate of both algorithms (SGA and HSGA) was measured and com-
276 M. Ohldal et al.
7 Conclusions
Optimization of communication schedules by means of the proposed evolutionary
algorithms has been successful. Optimal communication schedules achieve the lower
bounds of communication steps derived from graph-theoretical properties of intercon-
nection networks. It is evident that optimum schedules can speed-up execution of
many parallel programs that use collective communication as a part of their algorithm.
We have tested two types of evolutionary algorithms. The first one is standard ge-
netic algorithm SGA and the second one HGSA is a composition of parallel simulated
annealing and the standard genetic algorithm. Both presented algorithms are able to
find an optimal schedule of the given communication pattern for arbitrary network
topology, each one with sufficient efficiency.
Evolutionary Design of OAB and AAB Communication Schedules 277
The future work will be focused on the communication patterns OAS and AAS in
case of HGSA and OAB, AAB in case of Estimation of Distribution Algorithms
(EDA) [14]. We will implement multi-criteria optimization in EDA algorithms (with-
out the need to enter the number of communication steps) and to design and imple-
ment more efficient heuristics for HGSA.
Importance and novelty of above goals should be emphasized. Algorithms, which
would be able to find all types of collective communication on any regular or irregu-
lar topology, were not published so far in spite of a growing importance especially for
multiprocessors on chips.
Acknowledgement
This research has been carried out under the financial support of the research project
FRVS 2848/2006/G1 Memetic Evolutionary Algorithms Applied to Design of Group
Communication between Processors, FRVS 2983/2006/G1 EDA Evolutionary
Design of Group Communication Schedules, (Ministry of Education) and of the
research grant GA102/05/0467 Architectures of Embedded Systems Networks
(Grant Agency of Czech Republic).
References
1. Jantsch, A., Tenhunen, H.: Networks on Chip, Kluwer Academic Publ., Boston, 2003,
ISBN 1-4020-7392-5
2. Gabrielyan, E.: Hersch, R.D.: Networks Liquid Throughput: Topology Aware Optimiza-
tion of Collective Communication. Unpublished work, 2003
3. Goldberg, D.: Genetics Algorithms in Search, Optimization, and Machine Learning, Ad-
dision-Wesley Publishing Company, 1989
4. Defago, X., Schiper, A., Urban, P.: Total Order Broadcast and Multicast Algorithms: Tax-
onomy and Survey, technical report DSC/2000/036, 2003
5. Duato, J., Yalamanchili, S.: Interconnection Networks An Engineering Approach, Mor-
gan Kaufman Publishers, Elsevier Science, 2003
6. Staroba J.: Parallel Performance Modelling, Prediction and Tuning, PhD. Thesis, Faculty
of Information Technology, Brno University of Technology, Brno, Czech Rep., 2004
7. Ohldal, M., Schwarz, J.: Hybrid parallel simulated annealing using genetic operations, In:
Mendel 2004 10th Internacional Conference on Soft Computing, Brno, CZ, FSI VUT,
2004, pp. 89-94
8. Goldberg D. E.: A note on Boltzmann tournament selection for genetic algorithms and
population-oriented simulated annealing, Complex Systems, 1990, pp. 445460.
9. http://www-unix.mcs.anl.gov/mpi
10. Kita, H.: Simulated annealing. Proceeding of Japan Society for Fuzzy Theory and Sys-
tems, Vol. 9, No. 6, 1997
11. Jaro, J., Dvok, V.: Speeding-up OAS and AAS Communication in Networking System
on Chips, In: Proc. of 8th IEEE Workshop on Design and Diagnostic of Electronic Circuits
and Systems, Sopron, HU, UWH, 2005, pp. 206-210
12. Jaro, J., Ohldal, M., Dvok, V.: Evolutionary Design of Group Communication Sched-
ules for Interconnection Networks, In: Proceedings of 20th International Symposium of
Computer and Information Science, Berlin, DE, Springer, 2005, s. 472-481
278 M. Ohldal et al.
13. Dvok, V.: Scheduling Collective Communications on Wormhole Fat Cubes, In: Proc. of
the 17th International Symposium on Computer Architecture and High Performance Com-
puting, Los Alamitos, US, IEEE CS, 2005, pp. 27-34
14. Pelikan, M., Goldberg, E., Sastry, K.: Bayesian Optimization Algorithm, Decision Graphs,
and Occams Razor.
15. Chalmers, A.-Tidmus, J.: Practical Parallel Processing. International Thomson Computer
Press, 1996
16. http://www.sgi.com/products/remarketed/origin/pdf/hypercube.pdf
A Multiagent Algorithm for Graph Partitioning
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 279285, 2006.
c Springer-Verlag Berlin Heidelberg 2006
280 F. Comellas and E. Sapena
2 Graph Partitioning
Let G = G(V, E) be an undirected graph where V is the vertex set and E the
edge set. Although in the general partitioning problem both vertices and edges
can be weighted, here as in most of the literature, they are given unit weights.
A partition of the graph is a mapping of V into k disjoint subdomains Si such
k
that the union of all subdomains is V , i.e. i=1 Si = V . The cardinality of a
subdomain is the number of vertices in the subdomain Si , and the set of inter-
subdomain or cut edges (i.e. edges cut by the partition) is denoted by Ec and
referred to as the k-cut. The objective of graph partitioning is to nd a partition
which evenly balances the cardinalities of each subdomain whilst minimizing the
total number of cut edges or cut-weight, |Ec |. To evenly balance the partition,
|V |
the cardinality of the optimal subdomain is given by |Sopt | = k . The graph
partitioning problem can then be specied as: nd a partition of G such that
|Ec | is minimised subject to the constraint that |Sopt | 1 |Si | |Sopt | for
1 i k. In this paper we nd partitions with perfect balance.
0.2 0.2
0.5 0.5
0.5 0.5
0.4 0.4
1.0 1.0
0.6 0.6
The ants algorithm is a multiagent system based on the idea of parallel search.
Unlike other algorithms with a similar name which are generically known as ant-
colony optimization, our algorithm does not use pheromones or local memory.
Thus it is faster and easier to implement. A generic version of the algorithm
was proposed in [3]. The mechanism of the algorithm is as follows: Initially the
graph is k vertex colored at random keeping the number of vertices for each
color balanced. A given number of agents, which we call ants, is placed on the
vertices also at random. Then the ants move around the graph and change the
coloring according to a local optimization criterion: At a given iteration each
ant moves from the current position to the adjacent vertex with the lowest local
cost, i.e. the vertex with the greatest number of constraints (neighbors of a
dierent color) and replaces its color with a new color which increases the local
cost. At the same time, and to keep the balance, the algorithm chooses, from
a set of s random vertices, one with the lowest value of the local cost function
-from those which have the new color- and changes its color to the old color.
After these color changes, the local cost function is updated for the two chosen
vertices and their neighhors. The value of s is not critical, and for our tests we
considered s = 100. The actions are randomly repeated for each ant. An essential
characteristic of the algorithm comes precisely from the stochastic nature of the
changes performed. The agent or ant moves to the worst adjacent vertex with a
probability pm (it moves randomly to any other adjacent vertex with probability
1 pm ), and assigns the best color, under probability pc (otherwise it assigns
any color at random). Both probabilities are adjustable parameters and allow the
algorithm to escape from local minima and obtain partitions with k-cuts close to
the optimal. The process is repeated until a solution fullling all the constraints is
found or the algorithm converges. The number of ants in the algorithm is another
adjustable parameter that should increase with the diameter of the graph (the
maximum of the distances between pairs of vertices).
In the same way as in an insect colony the action of dierent agents with
simple behaviors gives rise to a structure capable of carrying out complicated
tasks, the algorithm presented here, which is based on a series of simple local
actions that might even be carried out in parallel, can obtain restrictive graph
partitions. Note that our algorithm is not a simple sum of local searches, as
they would quickly lead to a local solution. The probabilities pm and pc play an
important role in avoiding these minima, however their values are not critical
282 F. Comellas and E. Sapena
and aect mainly the convergence time of the algorithm which is shorter for
larger values of pm and pc as the index of local improvement at each iteration
increases.
An outline of the ants algorithm is shown here in pseudocode.
ANTS algorithm:
Initialize
n (number of ants), k, pm , pc
Color each vertex of the graph at random forming k balanced sets
Put each ant on a randomly chosen vertex
For all vertices
Initialize local cost function
End for
Initialize global cost function
best cost := global cost function
While (best cost > 0) do
For all ants
If (random < pm )
Move the ant to the worst adjacent vertex
Else
Move randomly to any adjacent vertex
End if
If (random < pc )
Change vertex color to the best possible color
Else
Change to a randomly chosen color
End if
Keep balance (Change a randomly chosen vertex with low local cost
from the new to the old color )
For the chosen vertices and all adjacent vertices
Update local cost function
Update global cost function
End for
If (global cost function < best cost )
best cost = global cost function
End if
End for
End while
4 Results
We tested the algorithm using a set of benchmark graphs which are available
from the Graph Partitioning Archive, a website maintained by Chris Walshaw
[13]. The graphs can also be downloaded from Michael Tricks website [11]. Many
A Multiagent Algorithm for Graph Partitioning 283
Table 1. Best partitions found with ants corresponding to perfect balance for 16 sub-
domains using benchmark graphs[13]. We provide for the graphs of reference the total
of vertices and edges, optimal subdomain size, cut size, new cut size and the algorithm
used to nd the old cut size. Boldface denotes those values for which the ants al-
gorithm has outperformed the known result. Algorithms: Ch2.0, CHACO, multilevel
Kernighan-Lin (recursive bisection), version 2.0 (October 1995) [7]. J2.2, JOSTLE,
multilevel Kernighan-Lin (k-way), version 2.2 (March 2000) [14]. iJ, iterated JOSTLE,
iterated multilevel Kernighan-Lin (k-way) [12]. JE, JOSTLE Evolutionary, combined
evolutionary/multilevel scheme [10].
Graph vertices edges domain size cut size [13] new cut size algorithm
C2000.5 2000 999836 125 923294 922706 Ch2.0
C4000.5 4000 4000268 250 3709887 3708532 Ch2.0
DSJC125.1 125 736 8 524 522 iJ
DSJC1000.1 1000 40629 63 43078 43001 Ch2.0
DSJC1000.5 1000 249826 63 229362 228850 Ch2.0
jean 80 254 5 161 161 Ch2.0
at1000 50 0 1000 245000 63 224403 224378 Ch2.0
at1000 60 0 1000 245830 63 225546 225183 Ch2.0
at1000 76 0 1000 246708 63 226371 225962 Ch2.0
le450 5a 450 5714 29 4063 4030 JE
le450 5b 450 5734 29 4065 4055 iJ
le450 5c 450 9803 29 7667 7656 iJ
le450 15a 450 8168 29 5636 5619 iJ
le450 15b 450 8169 29 5675 5641 iJ
le450 15c 450 16680 29 13512 13509 iJ
le450 15d 450 16750 29 13556 13550 iJ
le450 25a 450 8260 29 5325 5302 J2.2
le450 25b 450 8263 29 5041 5037 JE
le450 25c 450 13343 29 13457 13456 iJ
le450 25d 450 17425 29 13584 13539 iJ
miles500 128 1170 8 771 770 JE
miles750 128 2113 8 1676 1673 iJ
miles1000 128 3216 8 2770 2768 iJ
miles1500 128 5198 8 4750 4750 J2.2
mulsol.i.1 197 3925 13 3275 3270 Ch2.0
mulsol.i.5 185 3973 12 3371 3368 Ch2.0
myciel4 23 71 2 64 64 J2.2
myciel5 47 236 3 205 205 J2.2
myciel7 191 2360 12 1921 1920 iJ
queen5 5 25 160 2 151 151 J2.2
queen8 8 64 728 4 632 632 Ch2.0
queen8 12 96 1368 6 1128 1128 Ch2.0
queen12 12 144 2596 9 2040 2020 iJ
queen16 16 256 6320 16 4400 4400 J2.2
284 F. Comellas and E. Sapena
of these graphs were collected together for the second DIMACS implementation
challenge NP Hard Problems: Maximum Clique, Graph Coloring, and Satis-
ability, see [4]. For the test we considered the most dicult case of perfect
balance and choose a partition into 16 sets.
The number of ants ranged from 3 to 9 depending on the order of the graph.
The probabilities pm and pc were 0.9 and 0.85, respectively. We repeated the
algorithm 20 times for each graph (80 times for small graphs) and recorded the
best solutions. Each algorithm run takes around one minute, for a C program
(400 lines) compiled on a PC Pentium IV 2.8 GHz under Windows XP using
Dev-C++.
Most of the improvements were on results obtained previously with CHACO
(Ch2.0), a multilevel Kernighan-Lin (recursive bisection) [7] and iterated JOS-
TLE (iJ), an iterated multilevel Kernighan-Lin (k-way)[12]. In three cases we
improved on results obtained with JOSTLE Evolutionary (JE), a combined
evolutionary/multilevel scheme [10], and in six more cases we matched or out-
performed results obtained with JOSTLE (J2.2), a multilevel Kernighan-Lin
(k-way)[14]. In our experiments, the algorithm ants obtains better solutions for
the coloring test suite (16 subdomains, perfect balance) of graphs considered
in [12] in 27 cases and an equivalent solution in 7 cases (out of the 89 graph
instances).
5 Conclusion
The results show that our implementation of the multiagent algorithm ants for
the graph partitioning problem provides, for the balanced case, a new method
which complements, and even outperforms, known techniques. Given the sim-
plicity of the algorithm and its performance in the dicult case of balanced sets,
it is a promising method for graph partioning in the non-balanced cases. Note
also that adapting the algorithm for graphs with weighted vertices and edges
would be straightforward.
References
1. R. Banos, C. Gil, J. Ortega, and F. G. Montoya. Multilevel heuristic algorithm
for graph partitioning. In 3rd European Workshop on Evolutionary Computation
in Combinatorial Optimization. Lecture Notes in Comput. Sci. 2611 pp. 143153
(2003).
2. J. Abril, F. Comellas, A. Cortes, J. Ozon, M. Vaquer. A multi-agent system for
frequency assignment in cellular radio networks. IEEE Trans. Vehic. Tech. 49(5)
pp. 15581565 (2000).
3. F. Comellas and J. Ozon,. Graph coloring algorithms for assignment problems in
radio networks. Proc. Applications of Neural Networks to Telecommunications 2,
pp. 49-56. J. Alspector, R. Goodman and T.X. Brown (Eds.), Lawrence Erlbaum
Ass. Inc. Publis., Hillsdale, NJ (1995)
4. The Second DIMACS Implementation Challenge: 1992-1993: NP Hard Problems:
Maximum Clique, Graph Coloring, and Satisability. Organized by M. Trick (ac-
cessed January 8, 2006). http://dimacs.rutgers.edu/Challenges/index.html
A Multiagent Algorithm for Graph Partitioning 285
5. M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the
Theory of NP-Completeness, New York: W.H. Freeman, 1979, ISBN 0-7167-10447.
6. O. Goldschmidt and D.S. Hochbaum. Polynomial algorithm for the k-cut problem.
Proc. 29th Ann. IEEE Symp. on Foundations of Comput. Sci., IEEE Computer
Society, 444-451. (1988).
7. B. Hendrickson and R. Leland. A multilevel algorithm for partitioning graphs. In
S. Karin, editor, Proc. Supercomputing 95, San Diego. ACM Press, New York,
1995.
8. P. Korosec, J. Silc, B. Robic. Solving the mesh-partitioning problem with an ant-
colony algorithm. Parallel Computing 30(5-6) pp. 785801 (2004).
9. H. Saran and V. Vazirani. Finding k-cuts within twice the optimal. Proc. 32nd Ann.
IEEE Symp. on Foundations of Comput. Sci., IEEE Computer Society, 743751
(1991)
10. A. J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search and mul-
tilevel optimisation approach to graph partitioning. J. Global Optimization 29(2)
pp. 225241 (2004).
11. M. Trick. Graph coloring instances (web page), accessed January 8, 2006.
http://mat.gsia.cmu.edu/COLOR/instances.html
12. C. Walshaw. Multilevel renement for combinatorial optimisation problems. Annals
Oper. Res. 131 pp. 325372 (2004).
13. C. Walshaw. The graph partioning archive (web page), accessed January 8, 2006.
http://staffweb.cms.gre.ac.uk/~c.walshaw/partition/
14. C. Walshaw and M. Cross. Mesh Partitioning: a Multilevel Balancing and Rene-
ment Algorithm. SIAM J. Sci. Comput. 22(1) pp. 6380 (2000).
Tracing Denial of Service Origin: Ant Colony
Approach
Chia-Mei Chen, Bing Chiang Jeng, Chia Ru Yang, and Gu Hsin Lai
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 286 295, 2006.
Springer-Verlag Berlin Heidelberg 2006
Tracing Denial of Service Origin: Ant Colony Approach 287
can provide an effective stopgap measure, but does not eliminate the problem nor
does it discourage the attackers.
The proactive approach is to find the source of the DoS attack and to cooperate
with the internet service provider (ISP) or the network administrators stopping the
traffic from the origin. Hence, it can restore normal network functionality, preventing
reoccurrences and, ultimately, holding the attackers accountable. However, many
network-based DoS attacks use the flaw of TCP/IP to manipulate and falsify the
source address in the IP header. Conventional trace methods might not be able to
identify the origin as the source address could be spoofed.
The goal of this work is to propose an IP traceback approach to finding out the
origin of the DoS attack using the existing traffic flow information, without extra
support from the routers. Furthermore, some previous work needs to process a large
amount of packets, which may be too costly for detecting DoS attacks. An ant colony
based traceback algorithm is proposed, using the traffic flow information as the trace
for ants to discover the attack path.
2 Related Work
Savage et al. [2] described and implemented probabilistic packet marking (PPM).
When a packet passes through a router, the router determines if marking this packet
according to a predefined probability to the IP fragment identification field is
facilitated to store the IP Traceback information.
Song and Perrig [3] proposed modifications on Savages method to further reduce
storage requirements by storing a hash of each IP address, instead of the address
itself. It assumes that the victim possesses a complete network map of all upstream
routers. After edge-fragment reassembly, the method compares the resulting IP
address hashes with the router IP address hashes derived from the network map to
facilitate attack path reconstruction.
One disadvantage of packet marking approach is that all routers on the attack path
are required to support packet marking. In addition, the IP header encoding may have
practical restrictions. It negatively impacts users that use fragmented IP datagrams
and such encoding might have compatibility issues with the current TCP/IP
framework.
Snoeren et al. [4] proposed a hash-based IP traceback scheme, Source Path
Isolation Engine (SPIE). As packets traverse the network, digests of the packets get
stored in the router. The hash-based IP traceback is predicated on the deployment of
SPIE-enhanced routers in place of existing routers in the network infrastructure. But
this deployment path was impractical, a SPIE system must be incrementally
deployable in the existing network infrastructure without retrofit or forklift upgrade.
So the following research focuses on how to implement SPIE on current network
infrastructure. Strayer et al. [5] propose the concept of a SPIE Tap Box which is a
small, special purpose device that implements the full functionality of the SPIE but
without the benefit of access to the routers forwarding engine and internal data
structure. Rather, the Tap Box must rely only on the information it can glean by
passively tapping the lines into and out the router.
288 C.-M. Chen et al.
Ant algorithms [7] are inspired by the behavior of natural ants and applied to many
different discrete optimization problems, such as vehicle routing and resource
scheduling. In an Ant algorithm, multiple agents, represented by ants, cooperate with
each other using indirect communication mediated by pheromone. The Ant colony
algorithm was first introduced to solve the Traveling Salesman Problem (TSP) [6].
A moving ant lays some pheromone (in varying quantities) on the ground, thus
marking the path it follows by a trail of this substance. While an isolated ant moves
essentially at random, an ant encountering a previously laid trail can detect it and
decide with a high probability to follow it, then reinforcing the trail with its own
pheromone.
A major characteristic of ant algorithms are the viability of autocatalytic processes.
A "single-ant" autocatalytic process usually converges very quickly to a bad
suboptimal solution. Luckily, the interaction of many autocatalytic processes can lead
to rapid convergence to a subspace of the solution space that contains many good
solutions, causing the search activity to find quickly a very good solution, without
getting stuck in it. In other words, all the ants converge not to a single solution, but to
a subspace of solutions; thereafter they go on searching for improvements of the best
found solution. Therefore, we believe this feature will be helpful for finding a DoS
path.
pheromone trail intensity, until we find most of ants converge to the same path. In the
following section, we will describe the details of the proposed IP traceback scheme.
When the IDS on the victims network detects a DoS attack with spoofed
source(s), it could further analyze the packets of the DoS attack and find out the
suspected spoofed source IP address list. The proposed solution could take the victim
host as the starting point and perform the IP traceback. The detail of the ant colony
based DoS path traceback is described as follows.
At the initial stage, each network node uses the amount of total octets sent in
duration as f i and an initial value i (t ) . The flow information is selected to
determine the probability when an ant chooses a path.
p i (t ) =
[ i ( t ) ] [ f i ]
[ i ( t ) ] [ f i ]
i neighbor
where f i is the total octets sent in duration of router j, and i i (t) is the intensity of
pheromone trail on router i at time t.
Figure 1(a) illustrates the case that the ants arrive at Router4, the probability of
their next move is determined based on the flow information of the neighbor routers.
We assume that the total octet sent from Router5 is 2000, Router6 is 5000 and
Router7 is 3000. Therefore, the probability of choosing Router5 is 20%, Router6 is
50% and Router7 is 30%. Figure 1(b) shows the probabililty of the next move to each
neighbor router. More ants would choose the path with more flow, as a DoS attack
generates lots of flows.
Fig. 1. (a) the flow of Router 4; (b) the probability of selecting the next step
While exploring the network, each ant keeps track of the path and the number of
DoS flows. The above procedure is repeated tracing back to the upstream routers until
the ant reaches a boundary router of the monitored network. The intensity of
pheromone trail is revised after all the ants complete their route from the victim to a
boundary router. The path information obtained by each ant is used to
calculate i ij (t , t + 1) :
290 C.-M. Chen et al.
Qk
ik = ,
Lk
where Qk is the total amount of the octets belonging to the DoS attack on the k-th
ants path and Lk is the length of the k-th ants path. i(t,t+1) is the summation of the
pheromone laid by all the ants, expressed below.
m
i(t , t + 1) =
k 1
i
k
(t , t + 1) ,
by the k-th ant between time t and t+1, so the more ants pass through the edge, the
more pheromone will be laid on edge. The change of pheromone results in positive
feedback -- the more ants are following a path, the more attractive that path becomes
for being follow.
The intensity of pheromone on router i can be revised once i(t,t+1) is obtained
and is formulated as below.
(t + 1 ) = (t ) + (t , t + 1 ) ,
i i i
3.1 NetFlow
NetFlow is a traffic profile monitoring technology [8] and could provide vital
information for DoS traceback. If the packet belongs to an existent flow, traffic
statistics of the corresponding flow will be increased, otherwise a new flow entry will
be created.
4 Performance Evaluation
We verify the proposed solution by implementing the proposed system and evaluating
the performance by simulation. A simulated network with NetFlow-enable routers is
deployed, as the proposed DoS traceback solution uses the flow-level information to
perform the traceback.
The flow management component collects the flow based attributes. The open-
source tools, Scientific Linux [9], flow-tools [10], STREAM [11], are adopted in this
research to achieve the above NetFlow management purpose.
The proposed IP traceback scheme is based on ant algorithm and use NetFlow logs
to simulate the IP traceback process. Using artificial ants to explore the network and
collect information about the denial-of-service attacks to forecast the possible attack
path and traceback to the origin of the DoS attack.
1
A libpcap-based tool collects network traffic data and emits it as NetFlow flows to the
specified collector.
Tracing Denial of Service Origin: Ant Colony Approach 293
environment. Hping [15] is selected to simulate SYN Flood attack with IP spoofing.
Hping, a complex ping-based program, can send the customized pings to the remote
hosts and networks. The simulated attack scenario is illustrated in Figure 5.
Once the DoS flows are identified, the flow management component can find out
the octets sent by the DoS flows with the source address in the suspected source
address list. The finding then will be fed to the traceback component to find the DoS
attack path.
The results of the traceback are shown in the following figures presented in three
dimensional graphs, where the x-axis represents the path discovered by ants, the y-
axis represents the number of iterations, and the z-axis represents the number of ants
in y-th cycle found x-th path. The attack path found by the proposed ant colony based
traceback method is the one with the most ants.
Figure 6 shows the results of the traceback with full flow information provided by
the network. The proposed traceback method explores all the possible attack paths in
the initial stage of traceback and the ants would tend to converge to the attack path in
the following iterations. After about half of the simulation, most ants will converge on
the DoS attack path.
According to the results of the preliminary experiment, we verify that the proposed
solution can find out the DoS attack path in case all the routers in the network provide
flow information. However, in real environments, some flow information might be
294 C.-M. Chen et al.
lost, especially at the router on the DoS attack path. Other experimental results are
eliminated due to the length of the paper, but they all conclude that the proposed
solution can find the DoS path efficiently and correctly.
10
6
the number of ants 5
3
9
2
7
1 5 the number of iterations
0 3
8
-> >6 0
>5 4- >1 >8 1
4- -> 7- 5- ->
6 0
1-> ->1 -> -> >4 >1 9
> 0 >4 >4 2- 7- ->
0- - - -> -> >7
->1 ->2 0 4 4-
0 0 2-> 2->
> >
0- 0-
Dos path found
5 Conclusion
DoS attacks have become one of the major threats in the Internet and cause massive
revenue loss for many companies. However, DoS attacks are often associated with
spoofed source addresses, making them hard to identify the attacker. A proactive
approach to DoS attacks is to find the original machine which issues the attack and
stop the malicious traffic.
In this research, the traceback based on ant colony is proposed to identify the DoS
attack origin. Unlike the previous traceback schemes, such as packet marketing and
logging, which use packet level information, the proposed traceback approach uses
flow level information. Although the packet level information provides detailed
information about the network, the high processing cost is a challenge for deploying
those IP traceback methods in the real networks.
Ant colony algorithms have been successfully applied to various routing and
optimization problems. Based on our observation, the proposed traceback problem is
a variation of a routing problem and hence an ant colony based algorithm could be
used to find the DoS attack path.
The proposed method is verified and evaluated through simulation. The simulation
results show that the proposed method can successfully and efficiently find the DoS
attack path in various simulated network environments. Hence, we conclude that the
proposed solution is an efficient method to find the DoS attack origin in the networks.
Tracing Denial of Service Origin: Ant Colony Approach 295
The proposed DoS traceback method can identify the DoS attack path in case of
the spoofed source addresses. However, there are other attacks with spoofed source
addresses which need to be identified. Ant algorithms or other artificial intelligent
approaches could be further investigated for more generalized IP traceback problems.
A distributed flow management might be more scaleable for large networks. Further
study on the practical implementation and deployment on a large network can be done
to evaluate the scalability of the proposed solution.
References
1. Computer Security Institute, CSI/FBI Computer Crime and Security Survey, 2003,
http://www.crime-research.org/news/11.06.2004/423/.
2. S. Savage, D. Wetherall, A.Karlin, and T.Anderson ., Network Support for IP
Traceback, IEEE/ACM Trans. Networking, vol. 9, no. 3, 2001, pp.226237 .
3. D. Song and A. Perrig, Advanced and Authenticated Marking Schemes for IP
Traceback, Proc. IEEE INFOCOM, IEEE CS Press, 2001, pp. 878886.
4. A.C. Soneren, C. Partridge, L.A. Sanchez, C.E. Jones, F. Tachakountio, B. Schwartz, S.T.
Kent and W.T. Strayer ,Single-packet IP Traceback, IEEE/ACM Trans. Networking, vol.
10, no.6, 2002, pp.721734.
5. W.T Strayer, C.E. Jones, F. Tachakountio, B. Schwartz, R.C. Clements, M. Condell and C.
Partridge ,Traceback of Single IP Packets Using SPIE, Proc. DARPA information
Survivability Conference and Exposition vol. 2 April 22 -24, 2003 Washington, DC. pp.
266
6. G. Upton, Swarm Intelligence, http://www.cs.earlham.edu/~uptongl/project/Swarm_
Intelligence.html\.
7. M. Dorigo, V. Maniezzo & A. Colorni, The Ant System: An Autocatalytic Optimizing
Process, Technical Report No. 91-016 Revised, Politecnico di Milano, Italy, 1991.
8. Y. Gong ,Detecting Worms and Abnormal Activities with NetFlow, http://www.
securityfocus. com/ infocus/1796
9. Scientific Linux https://www.scientificlinux.org/
10. flow-tools information http://www.splintered.net/sw/flow-tools/
11. Stanford Stream data manager http://www-db.stanford.edu/stream/
12. VMware http://www.vmware.com/
13. zebra http://www.zebra.org/
14. fprobe http://fprobe.sourceforge.net/
15. hping http://www.hping.org/
Optimisation of Constant Matrix Multiplication
Operation Hardware Using a Genetic Algorithm
Centre for Digital Video Processing, Dublin City University, Dublin 9, Ireland
kinanea@eeng.dcu.ie
1 Introduction
2 Problem Statement
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 296307, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Optimisation of CMM Operation Hardware Using a GA 297
N 1
yi = aij xj , i = 0, . . . , N 1. (1)
j=0
aij =
M 1
bijk 2k , bijk 1, 0, 1 ,
1 1. (2)
k=0
yi = bijk 2k xj , i = 0, . . . , N 1. (3)
j=0 k=0
at
row
0.1328125 x3
22
21
0101
0010
x3
aij A i
20 1100
2D slice ofbijk
298 A. Kinane, V. Muresan, and N. OConnor
x0 x0
(Shifted Addends)
(Direct Addends)
Adder Network
Adder Network
x1 x1
PPST
yi yi
xN-1 xN-1
N rows of chunks
A =
N/r columns of
...
...
...
...
chunks
Each column a
CMM sub-problem
CMM Equation
...
The DPL algorithm iteratively builds a SOP, and the nal SOP terms are the
unique sub-expression selection options after considering all SD permutations of
the dot product constants in question. The nal SOP terms are listed in increas-
ing order of the number of adders required by the underlying sub-expressions.
Each SOP term is represented internally as a data structure with elements
p vec (a bit vector where each set bit represents a specic adder to be resource
allocated) and hw (the Hamming weight of p vec that records the total adder
requirement). The number of possible two input additions is equivalent to the
combinatorial problem of leaf-labelled complete rooted binary trees [8]. With r =
4, the number of possibilities is 180 (proof omitted to save space) and the general
series in r increases quickly for r > 4. We are currently researching an automated
method for conguring the DPL algorithm for any r. Currently, however, each
p vec is a 180-bit vector with a hw equal to the number of required adders.
The DPL algorithm executes for each SD permutation of the dot product
constants in question, and builds a permutation SOP at each iteration. This
process is described in detail in [9]. The permutation SOP for Eqn. 4 is given by
Eqn. 5 where pv means bit v is set in the 180-bit p vec for that SOP term.
hw hw hw
NULL
next_skip* next_skip* next_skip*
top_p* top_p* top_p*
CMML algorithm searches for the optimal overlapping nodes from each of the
DPL lists.
As is clear from Fig. 6, as the temperature T decreases, the value of the expo-
nential term X moves further from the central vertical axis for a xed f (j) and
f (k). As T decreases W 1 when f (j) < f (k) and W 0 when f (k) < f (j).
The original Boltzmann tournament selection algorithm proposed by Gold-
berg uses t = 3, and lets W equal the probability that j wins the tournament and
304 A. Kinane, V. Muresan, and N. OConnor
;
;
;;;;
ex W Weak
f(j) wins Victory
f(k) wins Strong
f(j) < f(k) => f(j) fitter f(k) < f(j) => f(k) fitter 1 Victory
;
;;
T T
0.5
2S
f(j) wins
1 f(k) wins
x x
(1W ) be the probability that k wins the tournament [10]. We propose a variation
on Goldbergs algorithm by introducing a fuzzy select threshold S to enhance
the population diversity. Using S , the selection algorithm can be programmed to
have a higher probability of selecting a weak candidate as a tournament victor
when the temperature T is high in the early generations. As the temperature
decreases and the algorithm converges on the optimum, the stronger candidate
has a greater chance of victory. The approach is summarised in Algorithm 2.
Step 3 Recombination. After pop size individuals have been selected, a pro-
portion of these are further selected for uniform crossover based on a probability
pc . Since each candidate is represented by N pointers, the uniform crossover
process generates a random N -bit binary mask. Each bit location in the mask
determines the mixture of genetic material from the parents each ospring is cre-
ated with. Consider Fig. 7. If a bit location in the mask is 0, the corresponding
pointer component for ospring 0 is created respectively from parent 0, and
Optimisation of CMM Operation Hardware Using a GA 305
Step 4 Mutation. After selection and crossover, the DPL component point-
ers of each candidate undergo mutation based on a probability pmut . If mutation
is applied, the degree of mutation is determined by a value M , where M ZZ.
A pointer selected for mutation moves M pointer locations up (M < 0) or down
(M > 0) its associated DPL skip list. The range of mutations possible depends
on the value of a parameter Mmax . The value for M is determined based on a
binomial probability density function p(M ) Eqn. 9. This distribution means that
if mutation is applied, smaller mutations are more likely than large mutations.
(2Mmax 1)!
p(M ) = 0.5M (0.5)((2Mmax 1)M ) (9)
M !((2Mmax 1) M )!
space tness values is quite low, according to the current tness function, relative
to the size of the solution space. Hence the current search is almost a needle in a
haystack search, so a healthy diversity is needed. Future work on this algorithm
aims to increase the dimensionality of the tness function to include other factors
like logic depth and fanout as well as adder count. Extending the tness function
should increase the granularity of the tness values in the solution space. Hence
the tuned genetic algorithm parameters are likely to change in future so that
the selective pressure will increase.
5 Experimental Results
For a fair comparison with other approaches, the number of 1-bit full adders
(FAs) allocated in each optimised architecture should be used as opposed to
adder units, since the bitwidth for each unit is unspecied in other publications
apart from in [5]. FA count more accurately represents circuit area requirements.
Using the 8-point 1D DCT (N = 8 with various M ) as a benchmarking CMM
problem, Table 2 compares results with other approaches based on adder units
and FAs where possible. Our approach compares favourably with [5] in terms of
FAs (see FA% savings in Table 2), even though this gain is not reected by the
number of adder units required.
Our previous results were based on running the proposed CMML GA with
untuned parameters for 100000 generations [9]. Using the tuned parameters of
Table 1, our results clearly improve as is evident from Table 2. The tuned pa-
rameters also nd these improved solutions after fewer generations (1000). For
each of the benchmarks in Table 2, the tuned parameters cause the proposed
algorithm to invoke its search space reduction mechanism (Section 4.2). This
reduces the search space from the order of 1020 to 1017 without compromising the
quality of the results , representing a reduction of more than 99%. The hypoth-
esis of achieving extra saving by permuting the SD representations is validated
by the fact that the best SD permutation corresponding to our results in Table 2
are not the CSD permutation.
Even given the savings illustrated in Table 2, there exists signicant potential
for improvement:
6 Conclusions
The general multiplierless CMM design problem has a huge search space, espe-
cially if dierent SD representations of the matrix constants are considered. The
proposed algorithm addresses this by organising the search space eectively, and
by using a GA to quickly search for near optimal solutions. Experimental results
validate the approach, and show an improvement on the current state of the art.
References
1. Potkonjak, M., Srivastava, M.B., Chandrakasan, A.P.: Multiple Constant Multipli-
cations: Ecient and Versatile Framework and Algorithms for Exploring Common
Subexpression Elimination. IEEE Transactions on Computer-Aided Design of In-
tegrated Circuits 15 (1996) 151165
2. Dempster, A.G., Macleod, M.D.: Digital Filter Design Using Subexpression Elim-
ination and all Signed-Digit Representations. In: Proc. IEEE International Sym-
posium on Circuits and Systems. Volume 3. (2004) 169172
3. Dempster, A.G., Macleod, M.D.: Using all Signed-Digit Representations to De-
sign Single Integer Multipliers using Subexpression Elimination. In: Proc. IEEE
International Symposium on Circuits and Systems. Volume 3. (2004) 165168
4. Macleod, M.D., Dempster, A.G.: Common subexpression elimination algorithm
for low-cost multiplierless implementation of matrix multipliers. IEE Electronics
Letters 40 (2004) 651652
5. Boullis, N., Tisserand, A.: Some Optimizations of Hardware Multiplication by
Constant Matrices. IEEE Transactions on Computers 54 (2005) 12711282
6. Macleod, M.D., Dempster, A.G.: Multiplierless FIR Filter Design Algorithms.
IEEE Signal Processing Letters 12 (2005) 186189
7. Martinez-Peiro, M., Boemo, E.I., Wanhammar, L.: Design of High-Speed Multi-
plierless Filters Using a Nonrecursive Signed Common Subexpression Algorithm.
IEEE Transations on Circuits and Systems II 49 (2002) 196203
8. Andres, S.D.: On the number of bracket structures of n-operand operations con-
structed by binary operations (2005) private communication.
9. Kinane, A., Muresan, V., OConnor, N.: Towards an Optimised VLSI Design
Algorithm for the Constant Matrix Multiplication Problem. In: Proc. IEEE Inter-
national Symposium on Circuits and Systems, Kos, Greece (2006)
10. Goldberg, D.: A note on Boltzmann tournament selection for genetic algorithms
and population-oriented simulated annealing. Complex Systems 4 (1990) 445460
Finding Compact BDDs Using Genetic
Programming
1 Introduction
Decision Diagrams (DDs) are used for the representation of Boolean functions
in many applications of VLSI CAD, e.g. in the area of logic synthesis [1] and
verication [2]. In the meantime DD-based approaches have also been integrated
in commercial tools.
The state-of-the-art data structure are Ordered Binary Decision Diagrams
(OBDDs) [2]. Since OBDDs are often not able to represent Boolean functions
eciently due to the ordering restriction [3],[4],[5] many researchers have ex-
tended the OBDD concept mainly in two directions:
1. Consider dierent decomposition, e.g. ordered functional decision diagrams
(OFDDs) [6] and ordered Kronecker functional decision diagrams (OKFDDs)
[7] make use of AND/EXOR based decompositions.
2. Loosen the ordering restriction, e.g. general BDDs (GBDDs) [8] allow vari-
ables to occur several times.
Following the second approach, of course, the most powerful technique is to have
no restriction on the ordering at all, i.e. to use BDDs without any restrictions
on ordering or variable repetition. BDDs are often exponentially more succinct
than OBDDs and also for the applications mentioned above the ordering restric-
tions are often not needed. The main reason why OBDDs have been used more
frequently is that ecient minimization procedures exist, like e.g. sifting [9]. For
BDDs similar techniques are not available.
Evolutionary approaches have also been applied successfully to OBDDs, but
there the problem reduces to nding a good variable ordering, i.e. a permutation
of the input variables [10]. In [11] Genetic Programming has been applied to a
tree-like form of BDDs with some additional constraints.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 308319, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Finding Compact BDDs Using Genetic Programming 309
2 Preliminaries
2.1 Binary Decision Diagrams
A BDD is a directed acyclic and rooted graph Gf = (V, E) representing a
Boolean Function f : Bn Bm . Each internal node v is labeled with a
Variable label(v) = xi Xn = {x1 , . . . , xn }, and has two successors then(v)
and else(v). The terminal nodes are labeled with 1 or 0 corresponding to the
constant Boolean functions. In each internal node the Shannon decomposition
g = xi g|xi =1 + xi g|xi =0 is carried out. The m root nodes represent the respective
output functions.
By restricting the structure of the BDD, special classes of BDDs can be
derived:
A BDD is complete, if on each path from a root to a terminal node each
variable is encountered exactly once.
A BDD is free (FBDD), if each variable is encountered at most once on each
path from a root to a terminal node.
A BDD is ordered (OBDD), if it is free and the variables appear in the same
order on each path from a root to a terminal node.
OBDDs are a widely used data structure in hardware design and verication be-
cause they are a canonical representation of Boolean Functions and they provide
ecient synthesis algorithms. However, for many functions the size of the OBDD
depends on the variable ordering. It may vary between linear and exponential in
the number of variables [2]. A lot of research has focused on the so-called vari-
able ordering problem, which is NP-hard [12]. Furthermore, there are functions
for which all variable orderings lead to an OBDD with exponential size. In turn,
for some of these functions there exist FBDDs or BDDs of polynomial size [13].
This means that releasing the read-once restriction and the ordering of variables
can be advantageous. But in constrast to the minimization of OBDDs by nding
a good or optimal variable ordering which is well understood [14],[9] there
are no heuristics for the construction of small BDDs up to today.
x0 x0 MUX
1 0
x1 x1 MUX
1 0
x2 x2 MUX
1 0
0 1 0 1
3 Evolutionary Algorithm
The approach presented in this paper is based on Genetic Programming (GP)
[17]. Originally, GP was used to evolve LISP programs. The method at hand does
not consider programs, but works directly on the graph structure of the BDDs.
Several operators are provided to customize the algorithm for a specic problem.
The algorithm has been implemented on top of the evolving objects library [18],
an open source C++ library for evolutionary algorithms. The target function is
kept in memory as OBDD. For this the BDD package CUDD [19] is used. The
aim of the evolutionary algorithm is formulated as follows:
The objective is to evolve BDDs that are a correct and compact repre-
sentation of a given target function.
3.2 Representation
The individuals are directed acyclic graphs with multiple root nodes, each cor-
responding to an output of the represented function. By adopting some popular
1
In the gures, solid lines denote then edges and dashed lines denote else edges.
Finding Compact BDDs Using Genetic Programming 311
Initialization End
Yes
Evaluation
Termination? No
Replacement Selection
Evaluation Operators
techniques used in BDD packages, the graphs are always reduced, i.e. isomorphic
subgraphs exist only once and there are no nodes with then(v) and else(v) being
identical.
Table 1 gives an overview of the genetic operators. Beyond the standard classes
initialization, recombination and mutation there are some special operators,
functionally conserving ones, which do not change the functional semantics of
the individuals but the structure of the graphs.
initialization Init
CuddInit
recombination NodeXover
OutputXover
mutation VariableMut
EdgeSwapMut
EraseNodeMut
AddMinterm
functionally conserving Restructuring
VariableDuplication
CombinedRestructuring
TautoReduction
SplitReduction
Resize
Initialization. There are two types of initialization, Init and CuddInit. The
rst one generates random graphs with a given maximum depth. The second one
creates OBDDs with a randomized variable ordering which represent the target
function correctly. The name is derived from the underlying BDD package which
is used for the synthesis of the OBDDs.
Mutation. Among the mutation operators there are three simple ones and
the customized AddMinterm operator. VariableMut exchanges the variable of
one randomly selected node. EdgeSwapMut selects one node and swaps its then
and else egdes. EraseNodeMut removes one node from the graph by replacing
it with one of its successors. This may be useful to eliminate redundant nodes.
AddMinterm works as follows: rst an assignment a Bn is generated. If the
Finding Compact BDDs Using Genetic Programming 313
x0 x0
x1 x1 x1 x1
x2 x2 x2 x2 x2
1 0 1 0 1 0
individual and the target function evaluate to dierent values under this assign-
ment, a OBDD-like subgraph is added to the individual so that it will evaluate to
the correct value afterwards. In order to create the subgraph, the target function
is restricted in all variables that have been read on the path in the individual
that corresponds to the assignment a. Then the new subgraph is appended to
the end of this path. The operator can be used to speed up the algorithm, if
the target function is relatively complex and so it would take too long to nd a
correct solution at all. It is a drawback that always OBDD-like subgraphs are
created. This may lead to local optima that are hard to escape from.
Example 1. Figure 3 shows an example for the AddMinterm operator. Let the
target function be the 3 bit parity function given by f (x0 , x1 , x2 ) = x0 x1 x2 .
Consider the individual in Figure 3(a) and the randomly chosen assignment
a = x0 x1 x2 . The corresponding path is highlighted in the gure. As on this
path only x0 is evaluated, the remaining function to be added is frest = f |x0 =1 =
x1 x2 + x1 x2 . The OBDD for this function (see Figure 3(b)) is added to the end
of the path, obtaining the new indiviual in Figure 3(c). Note that the correlation
has increased from 5/8 to 7/8, i.e. only one of eight minterms is wrong after the
application of AddMinterm.
Functionally Conserving Operators. Among the functionally conserving op-
erators there are two operators that perform a local restructuring of the graphs.
Restructuring searches for subgraphs that are isomorphic to one of the graphs
shown in Figure 4 on the right and on the left. Then this subgraph is reduced to
the one shown in Figure 4 in the middle. Note that the three graphs are func-
tionally equivalent. The operator VariableDuplication duplicates a variable
on a randomly selected path. The transformation is shown in Figure 5. In this
way redundancy is added to the graph. In some cases this can lead to better so-
lutions if the redundancy is removed elsewhere in the graph (this can be done by
TautoReduction or SplitReduction which are described below). Furthermore
the operator increases the diversity of the population. CombinedRestructuring
is a combination of the two operators described above. Both are applied several
times in random order.
314 U. Kuhne and N. Drechsler
y z
x
x x x x
y z
f z g h y k
f g h k
k h f g
x x
x
y y y y
y y
x x h k f g x x
f g h k
f h g k f h g k
Finally there are two operators that try to reduce the number of nodes with-
out changing the function of an individual. Algorithmically they are very sim-
ilar. Figure 6 shows the TautoReduction operator in pseudocode. It searches
for redundant nodes from top to bottom. A node v is identied redundant,
if the variable index(v) has already been evaluated to the same value on all
paths reaching v. A redundant node can then be replaced by the appropriate
successor.
Finding Compact BDDs Using Genetic Programming 315
x x x
y x y y
z z z
x x x x x
1 0 1 0 1 0
x x
y y y y
x x
y z y z
1 0 1 0
observed that the two operators work together quite well. Thus, there is a com-
bination of these two operators, called Resize. Resize can be called at the end
of every generation. It tries to reduce all individuals of the population using
TautoReduction and SplitReduction, so that the number of nodes does not
exceed a given bound. This can be used to keep the individuals relatively small
in order to save run time.
4 Experimental Results
Experiments have been carried out to show the eectiveness of the approach. The
runs presented in the following are supposed to exemplify the dierent working
methods of the evolutionary algorithm. The rst one starts with a population of
randomly initialized graphs, the second one starts with correct OBDDs.
Example 4. Consider the hidden weighted bit (HWB) function given by the fol-
lowing equation:
0 if w = 0
HW B(x0 , . . . , xn1 ) = where w = |x0 , . . . , xn1 |.
xw1 else
f0
x0 f0
x1 x1 x3
x2 x2 x2 x2 x0
x3 x3 x0 x1 x2
0 1 0 1
Fig. 9. OBDD and BDD for the 4 bit hidden weighted bit function
Finding Compact BDDs Using Genetic Programming 317
Note that for each output function there is a dierent order leading to the op-
timal OBDD size, thus there is no global order for which all partial OBDDs
are optimal. The size of the optimal shared OBDD is 31. The evolutionary ap-
proach is applied with the following settings: the initial population consists of
100 correct OBDDs. OutputXover and CombRestucturing are applied each with
a rate of 0.3, TautoReduction and SplitReduction are applied with a respec-
tive rate of 0.1. Selection and replacement are the same as in Example 4. After
41 generations, an individual emerged that represents the target function with
17 nodes.
Table 2 shows additional results. Besides the name of the circuit and the numer
of inputs and outputs the size of the minimal OBDD is given. The last column
shows the size of the smallest BDD that could be found by our approach. The
algorithm has been run 50 times with a limit of 200 generations and a population
size of 100. Only for the largest benchmarks with 7 inputs a population size of
200 and a generation limit of 300 have been used. As can be seen, in many cases
smaller representations could be found. Especially the HWB and ISA functions
for which there is an exponential gap between their OBDD- and BDD-size show
good results. But also for randomly generated functions the graph size could be
improved. For other benchmarks no improvements could be made, but it should
be noted that for numerous common functions there are OBDDs of linear size
and thus no improvements can be expected by using BDDs instead.
In this paper it has been shown that it is possible to construct compact BDDs
using genetic programming. First experiments have yielded some promising re-
sults. However, there are still some problems to be solved. For large functions
that depend on many variables, it takes too long to evolve a correct solution
from a randomly initialized population. This can be avoided, if correct OBDDs
are used as initial population. Unfortunately, the regular structure of the OB-
DDs seems to be very stable, and the algorithm will hardly escape from the local
optima induced by the OBDDs. Certainly there is still capability for improve-
ments. Possibly new operators that act less locally than Restructuring could
help to break the OBDDs.
References
1. Drechsler, R., Gunther, W.: Towards One-Path Synthesis. Kluwer Academic Pub-
lishers (2002)
2. Bryant, R.: Graph-based algorithms for Boolean function manipulation. IEEE
Trans. on Comp. 35 (1986) 677691
3. Ajtai, M., Babai, L., Hajnal, P., Komlos, J., Pudlak, P., Rodl, V., Szemeredi, E.,
Turan, G.: Two lower bounds for branching programs. In: Symp. on Theory of
Computing. (1986) 3038
4. Bryant, R.: On the complexity of VLSI implementations and graph representations
of Boolean functions with application to integer multiplication. IEEE Trans. on
Comp. 40 (1991) 205213
5. Becker, B., Drechsler, R., Werchner, R.: On the relation between BDDs and
FDDs. Technical Report 12/93, Universitat Frankfurt, 12/93, Fachbereich Infor-
matik (1993)
6. Kebschull, U., Schubert, E., Rosenstiel, W.: Multilevel logic synthesis based on
functional decision diagrams. In: European Conf. on Design Automation. (1992)
4347
7. Drechsler, R., Sarabi, A., Theobald, M., Becker, B., Perkowski, M.: Ecient rep-
resentation and manipulation of switching functions based on ordered Kronecker
functional decision diagrams. Technical Report 14/93, J.W.Goethe-University,
Frankfurt (1993)
8. Ashar, P., Ghosh, A., Devadas, S., Newton, A.: Combinational and sequential logic
verication using general binary decision diagrams. In: Intl Workshop on Logic
Synth. (1991)
Finding Compact BDDs Using Genetic Programming 319
9. R.Rudell: Dynamic variable ordering for ordered binary decision diagrams. In:
Intl Workshop on Logic Synth. (1993) 3a13a12
10. Drechsler, R., Becker, B., Gockel, N.: A genetic algorithm for variable ordering of
OBDDs. IEE Proceedings 143 (1996) 364368
11. Sakanashi, H., Higuchi, T., Iba, H., Kakazu, Y.: Evolution of binary decision
diagrams for digital circuit design using genetic programming. In: ICES. (1996)
470481
12. Bollig, B., Wegener, I.: Improving the variable ordering of OBDDs is NP-complete.
IEEE Trans. on Comp. 45 (1996) 9931002
13. Wegener, I.: Bdds design, analysis, complexity, and applications. Discrete Applied
Mathematics 138 (2004) 229251
14. Friedman, S., Supowit, K.: Finding the optimal variable ordering for binary deci-
sion diagrams. In: Design Automation Conf. (1987) 348356
15. Buch, P., Narayan, A., Newton, A., Sangiovanni-Vincentelli, A.: On synthesizing
pass transistor networks. In: Intl Workshop on Logic Synth. (1997)
16. Ferrandi, F., Macii, A., Macii, E., Poncino, M., Scarsi, R., Somenzi, F.: Layout-
oriented synthesis of PTL circuits based on BDDs. In: Intl Workshop on Logic
Synth. (1998) 514519
17. Koza, J.: Genetic Programming - On the Programming of Computers by means of
Natural Selection. MIT Press (1992)
18. M. Keijzer, J.J. Merelo, G.R., Schoenauer, M.: Evolving objects: a general purpose
evolutionary computation library. In: Evolution Articielle. (2001)
19. Somenzi, F.: CUDD: CU Decision Diagram Package Release 2.4.0. University of
Colorado at Boulder (2004)
Efficient Evolutionary Approaches for the Data Ordering
Problem with Inversion
Abstract. An important aim of circuit design is the reduction of the power dis-
sipation. Power consumption of digital circuits is closely related to switching
activity. Due to the increase in the usage of battery driven devices (e.g. PDAs,
laptops), the low power aspect became one of the main issues in circuit design
in recent years. In this context, the Data Ordering Problem with and without In-
version is very important. Data words have to be ordered and (eventually) ne-
gated in order to minimize the total number of bit transitions. These problems
have several applications, like instruction scheduling, compiler optimization,
sequencing of test patterns, or cache write-back. This paper describes two evo-
lutionary algorithms for the Data Ordering Problem with Inversion (DOPI). The
first one sensibly improves the Greedy Min solution (the best known related
polynomial heuristic) by a small amount of time, by successively applying mu-
tation operators. The second one is a hybrid genetic algorithm, where a part of
the population is initialized using greedy techniques. Greedy Min and Lower
Bound algorithms are used for verifying the performance of the presented Evo-
lutionary Algorithms (EAs) on a large set of experiments. A comparison of our
results to previous approaches proves the efficiency of our second approach. It
is able to cope with data sets which are much larger than those handled by the
best known EAs. This improvement comes from the synchronized strategy of
applying the genetic operators (algorithm design) as well as from the compact
representation of the data (algorithm implementation).
1 Introduction
The power consumption became a barrier during the design of embedded systems, as
soon as the limits of the paradigm the smaller the device, the faster it is were
reached. Due to the ever increasing demand for electronic devices with bigger storage
capacity and quicker access time (see e.g. [5]), the low-power techniques have to be
taken into account already in the first phases of the design process. Thus, various
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 320 331, 2006.
Springer-Verlag Berlin Heidelberg 2006
Efficient Evolutionary Approaches for the Data Ordering Problem with Inversion 321
methods for decreasing the power dissipation have been developed (see e.g. [1], [15],
[21], [26]).
The challenges in the design of embedded systems can be split into two major
categories: hardware-related and software-related. The hardware designers try to find
methods to optimize the switching activity and the voltage in the circuit ([6], [25]).
The software component is also very important for the power consumption of the
circuit ([23], [24]). In this area an efficient design can provide significant improve-
ments. Power consumption often is directly determined by the design complexity. For
this reason, power consumption has grown in the past years with increasing design
complexities. Therefore, power management is a critical design priority. As such,
lower power consumption has a positive effect on battery life, packaging, cooling
costs and reliability. A new direction of the design methodologies is necessary to
handle the power management issue in a successful way.
The power consumption on the software level depends on the switching activity
and the capacitance. The switching activity, as an important design metric, character-
izes the quality of an embedded system-on-chip design. It is implicitly related to the
orderng of the data sequences. The first problem engaged with this topic is the order-
ing of data words to minimize the total number of transitions Data Ordering Problem
(DOP). In [18] it is demonstrated that this problem is NP-complete. Recently, some
algorithms were proposed for optimizing the number of transitions. In [22], Stan and
Burleson introduced the bus-invert method. The main idea is to use an extra bus line
with bits, called invert, which contains the information regarded as the phase-
assignment for all transmitted words. For every word there is a bit flag that signals if
the transmitted word is complemented (inverted), flag = 1, or left as initial, flag = 0.
Adding this new paradigm which applies to DOP an extra degree of freedom, the total
number of transitions can be lower than the number provided with DOP. The resulting
problem is the so-called Data Ordering Problem with Inversion (DOPI) (see also [11,
19]). A formal definition, related terms, and examples are given in Section 2.
As a general method for solving optimization problems, EAs are getting more and
more popular. Recently, EAs have been successfully applied to several problems in
VLSI CAD (see e.g. [8], [9], [11], [16]). In [10] an efficient genetic approach for the
DOP is proposed and in [11] an evolutionary algorithm for DOPI. This EA approach
provides high-quality results (better than polynomial heuristics), but they need also a
large amount of runtime.
In this paper, we propose two evolutionary algorithms for DOPI: one which per-
forms quick small improvements and the other one, which is a hybrid genetic algo-
rithm that performs significant improvements in a larger amount of run time. For
smaller-sized instances we applied the optimal exact algorithms (both DOP and
DOPI) for comparing the behavior of the two problems as well as the results provided
by other related algorithms. In [19] a lower bound algorithm for DOPI is introduced.
In our study, we will use it to check the deviation from the optimum of the results
provided with the proposed EAs. There are three categories of input data: small, me-
dium and large data sizes. The focus is on optimizing the DOPI approach, which adds
the paradigm of phase-assignment to DOP and thereby improves the performance of
the results.
322 D. Logofatu and R. Drechsler
Definition 2.2. Total number of transitions. The total number of transitions is the sum
of the Hamming distance needed for the transmission of all the words. It is denoted
with NT. If is a permutation of the bit strings w1, w2, ..., wn, than the total number of
n 1
transitions will be: NT = d ( w ( j ) , w ( j +1) ) .
j =1
Definition 2.3. Adjacency matrix. For a given problem instance (where n is the num-
ber of words and k their length) we define the adjacency matrix An x n where A(i, j) =
d(wi, wj).
Definition 2.4. Phase-assignment. The inversion of a data word wi is also called the
negation (complementation) and is denoted wi . The polarity (phase-assignment) is
a function defined on the set of words with values in a set of words and (w) is w
(case w may be complemented) or w (case w may be not complemented).
Proposition 1.1. For adjacency matrix and inversion holds i, j {1, .., n}:
a. d ( wi , w j ) = d ( w j , wi ) aij = a ji
b. d ( wi , wi ) = 0 aii = 0
c. d ( wi , w j ) = d ( wi , w j )
d. d ( wi , w j ) = d ( wi , w j ) = k d ( wi , w j ) = k aij
Definition 2.5. DOP: Find a permutation of the bit strings w1, w2, , wn such that
the total number of transitions:
n 1 (1)
NT = d ( w ( j ) , w ( j +1) )
j =1
is minimized.
Efficient Evolutionary Approaches for the Data Ordering Problem with Inversion 323
Definition 2.6. DOPI: Find a permutation of the bit strings w1, w2, , wn and a
phase-assignment such that the total number of transitions:
n 1 (2)
NT = d ( ( w ( j ) ), ( w ( j +1) ))
j =1
is minimized.
3 Previous Approaches
The DOP and DOPI are very similar to the Traveling Salesman Problem (TSP). For
all three problems a good ordering of elements with respect to a given weight between
each two elements has to be determined. Since the DOP and DOPI are NP-Complete,
the exact algorithms can only handle very small instances. In the past few years some
heuristics were developed for both DOP and DOPI (most of them in relation with the
TSP problem):
1. Double Spanning Tree (DST) [12]
2. Spanning Tree/Minimum Matching (ST-MM) [12]
3. Greedy Min (GM) [18]
4. Greedy Simple (GS) [10]
5. Evolutionary Heuristics [11]
The most powerful polynomial heuristic known so far is Greedy Min and it can be
applied to both DOP/DOPI:
1) Computes the Hamming distance for all (distinct) pairs of given words and se-
lects the pair with a minimum cost.
2) Chooses the most convenient pair of words. The beginning sequence will con-
tain these two words.
3) Builds progressively the sequence, adding in every step of the most conven-
ient word (that was not yet added). This word can be added either at the beginning or
at the end of the sequence, depending where the Hamming distance is minimal.
The EAs are the best algorithms regarding the quality of results. Such evolutionary
approaches provide better results than the above-presented Greedy Min, but with
significantly more time resources. EAs which perform high-quality optimizations are
presented in [10], [11]. In [11] are presented evolutionary algorithms for both DOP
and DOPI. For DOPI the mutation and crossover operators are applied in parallel for
creating new individuals. This is also a hybrid EA, since the initial individuals are
preprocessed using greedy methods. The results provided by the EA are better than
the Greedy Min results, but the maximal number of words is 100.
In [19] a graph theory related model for DOPI is introduced, together with a rele-
vant graph theoretic background. For a DOPI instance with n words, each of length k,
a multigraph can be created accordingly. The vertices are the words and the edges are
labeled with the distance between the words. According to Proposition 1.1. c) and d),
if the words are in the same phase-assignment (both 0 or both 1), then the distance
between them is the same. Also, if they are in different phase-assignment, the distance
324 D. Logofatu and R. Drechsler
remains the same. There are two edges between two vertices (one if the words are in
the same phase, one for the case if they are transmitted in different phases). In this
manner DOPI is transformed in the equivalent NP-complete problem of finding the
Hamiltonian path with the minimum length. As shown in [19], a lower bound for the
length of this path is the weight of the minimal spanning tree of the multigraph. To
determine the minimal spanning tree there are two classical greedy algorithms: Prim
(uses the vertex connections) and Kruskal (uses the edges). In the experimental tests
from Section 5, we use the Kruskal approach to check the deviation to the optimum
for the provided EA results.
4 Evolutionary Algorithms
In this section the two evolutionary approaches for DOPI will be presented: the first is
a simple one which operates on a single individual (the Greedy Min resulted one) with
the Simple Inversion Mutation (SIM) operator and the Simple Cycle Inversion Muta-
tion (SCIM) operator. This performs improvements of the greedy solution in a small
amount of time. The second one is based on the classical genetic algorithm model.
The genetic operators are applied synchronized for ordering and phase-assignment, in
order to preserve the good subsequences. As can be seen later in the result tables, the
results are much better than the previous mutation algorithm, but the time needed
increased significantly. follow these instructions closely in order to make the volume
look as uniform as possible.
We would like to stress that the class/style files and the template should not be ma-
nipulated and that the guidelines regarding font sizes and format should be adhered to.
This is to ensure that the end product is as homogeneous as possible.
(1 2 3 4 5 6 7 8) (1 0 1 1 0 1 0 0)
(1 8 7 4 5 6 3 2) (1 0 0 1 1 1 1 0)
Fig. 1. The mutation operator SCIM for a Permutation and Bit String and the cut points (7, 3)
326 D. Logofatu and R. Drechsler
4.3.1 Representation
A potential solution is a pair (Permutation, BitString). The Permutation object is the
representation of the word ordering. The BitString object is the representation of the
phase-assignment (value 1/0 means that the corresponding data word has/not to be
inverted). A population is a set of such pair-elements. The genetic operators can be
applied on these elements.
The adjacency matrix is symmetric and its main diagonal is always filled with 0s
(zeros). We will only keep its elements from the upper main region in a vector with
(n-1) + (n-2) + +1 = n(n-1)/2 elements (instead of the whole matrix with n2 ele-
ments). This representation will minimize the space needed for its representation and
assures a higher speed for the application.
4.3.3 Initialization
Often it is helpful to combine EAs with problem-specific heuristics (see e.g. [4], [8],
[9], [11]). The resulting EAs are called Hybrid EAs. In our case, we will use Greedy
specific techniques for initialization. Some of the initial individuals will be initialized
using the Greedy Min method, some of them with a different Greedy technique - with
comparable results. The rest will be random individuals. The initialization using the
greedy methods guarantees that the starting point is not completely random and
thereby the convergence is speeded up. Because the complexity of the Greedy algo-
rithms is polynomial O(n2), this initialization step uses little time, but helps signifi-
cantly for a faster convergence. The number of initial Greedy Min generated indi-
viduals is n, the same as the number of individuals generated with the second Greedy
method (like Greedy Min, with the difference that it is restricted on one end).
Efficient Evolutionary Approaches for the Data Ordering Problem with Inversion 327
PMX: Constructs the children by choosing the part between the cut positions from one
parent and preserve the absolute position and order of as many variables as possible
from the second parent. The PMX can be corresponding applied to a bit string, pre-
serving the inner sequence from a parent and the lateral ones from the other one.
CPMX: Constructs the children exactly like in PMX with the difference that the next
element for the last one is the first one, like in the SCIM. The PMX is applied to the
sequence from the places (i, j) with i>j: i, i+1, , n, 1, j. The CPMX can be also
applied to a bit string with the preservation of this sequence from a parent and the
inner one from the other parent.
OX: Construct the children by choosing the part between the cut positions from one
parent and preserve the relative position and order of as many variables as possible
from the second parent.
COX: It is similar with SCIM and CPMX, the sequence is considered circular and the
OX will be applied to the pairs (i, j) with i>j, for the elements from places i, i+1, ,
n, 1, j. It can be applied also to bit strings.
The used mutation operators are the SIM and SCIM, which are presented in the above
section.
4.3.5 Algorithm
The algorithm is based on the classical sketch of the genetic algorithms. A very
important step is the initialization of the first individuals. A refined version of the
classical genetic algorithm:
ALGORITHM_GA_DOPI
Initialize(populationSize)
Initialize(crossoverRate)
Initialize(mutationRate)
numCrossovers populationSize*crossoverRate
numMutations populationSize*mutationRate
Initialise_GreedyMin_individuals()
Initialise_Greedy1_individuals()
Initialise_Random_individuals()
for( i 1;i numGenerations; step 1) execute
Apply_Crossover_operators(numCrossovers);
Apply_Mutation_operators(numMutatios);
Calculate_fitness(allNewIndividuals);
Remove_WorstInviduals (numCrossovers+numMutations);
end_for
return best_element;
END_ALGORITHM_GA_DOPI
For the Greedy Min initialization step n-1 different pairs of words are chosen and
a solution is built using the local optima strategy given by Greedy Min. The same will
be performed for another Greedy individuals initialization, with the difference that at
the beginning there are only different words generated, which are supposed to remain
the start words for the potential solutions.
5 Experimental Results
All algorithms are implemented in C++, using the Standard Template Library. The
experimental results are focused on the DOPI problem, which is more efficient than
DOP (due to the extra degree of freedom in choosing the phase-assignment). The
extra degree of freedom increases also the complexity of the problem. The DOP com-
plexity is O(n!), since for a given n there are n! different permutations. The complex-
ity for DOPI is O(2nn!), because for every permutation all the bit strings of length n
have to be checked (there are 2n such bit strings). It follows that the exact solutions
can be found only for very small dimensions. The test program initializes a random
instance for the problem with the given (n, k) values: n bit strings (the words), each of
length k. For these instances, various algorithms are applied. In the first table we per-
form the exact algorithms. For comparing the DOP and DOPI representative results,
this table contains also the DOP random and exact ones. The sizes are n {6, .., 10}
and for every such n value, k {10, 30, 50}. The performed algorithms are: RAN
(random), EX (exact), G (Greedy Min), MUT (EV_MUT_ALGORITHM), GA (Ge-
netic Algorithm) and LB (Lower Bound).
Efficient Evolutionary Approaches for the Data Ordering Problem with Inversion 329
As expected, all exact values for DOPI are better than the values for exact DOP
and the same instances. The lower_bound values for DOPI are lower than the exact
values, but very close to them. The mutation algorithm, which is not time consuming
and changes progressively the current individual, often produces improvements. The
genetic algorithm is in general better than the mutation algorithm and often provides
the best solution, especially for very small sizes.
We note that the results for GA are better than the mutation algorithm results. For
the same n, if the k value is greater, then the difference from this result and the lower
bound result is greater.
For these large examples the improvements provided with the MUT algorithm are
not as significant as with the previous data sets. The genetic algorithm still provides
the best solutions and the difference to the lower_bound algorithm remains small
related to the (n, k) instance sizes. Because the complexity of the problem is exponen-
tial, it is expected that if the size of the population and the number of generations are
greater, the results for the large examples will be even better. This requires a powerful
system for running the GA application.
6 Conclusions
Two evolutionary approaches for DOPI were presented. Additionally, we described
algorithms such as: random, exact, lower_bound and greedy which helped to compare
the results of these approaches. Due to the very high complexity of the problem, it is
very important that we know how the genetic operators are applied. The pair (Permu-
tation, BitString) to represent a DOPI solution is considered an object and the opera-
tors have to be applied accordingly to both components, in order to preserve the good
traits from the parents as well as to have the chance to provide better successors. The
genetic algorithm is a hybrid one, because the initialization step uses greedy tech-
niques. In addition to the known Greedy Min algorithm, another comparable Greedy
is introduced and used during the initialization phase. The complexity for larger data
sets comes from two directions: one that is the needed space to store the whole popu-
lation and one is the time to perform the genetic operations. The final results provided
by both EAs are better than the results provided by the Greedy Algorithm.
References
1. Chandrakasan, A. P., Potkonjak, M., Rabaey, J., Brodersen, R. W.: HYPER-LP: a system
for power minimization using architectural transformations. In Intl Conf on CAD, pages
300-303, 1992
2. Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C.: Introduction to Algorithms, Sec-
ond Edition, The MIT Press; 2nd edition (September 1, 2001)
3. Davis, L.: Applying adaptive algorithms to epistatic domains. In Proceedings of IJCAI,
pages 162-164, 1985
4. Davis, L.: Handbook of Genetic Algorithms. van Nostrand Reinhold, New York, 1991
5. De Micheli, G.: Synthesis and Optimization of Digital Circuits. McGraw-Hill, Inc., 1994
6. Devadas, S., Malik, S.: A survey of optimization techniques targeting low power VLSI cir-
cuits. In Design Automation Conf., pages 242-247, 1995
Efficient Evolutionary Approaches for the Data Ordering Problem with Inversion 331
7. Drechsler, N., Drechsler, R.: Exploiting dont cares during data sequencing using genetic
algorithms. In ASP Design Automation Conf., pages 303-306, 1999
8. Drechsler, R.: Evolutionary Algorithms for VLSI CAD. Kluwer Academis Publisher, 1998
9. Drechsler, R., Drechsler, N.: Evolutionary Algorithms for Embedded System Design. Klu-
wer Acadmeic Publisher, 2002
10. Drechsler, R., Gckel, N.: A genetic algorithm for data sequencing. Electronic Letters,
33(10): 843-845, 1997
11. Drechsler, R., Drechsler, N.: Minimization of Transitions by Complementation and Rese-
quencing using Evolutionary Algorithms, In Proceedings of 21st IASTED International
Multi-Conference Applied Informatics (AI 2003), IASTED International Conference on
Artificial Intelligence and Applications (AIA 2003), Innsbruck, 2003
12. Garey, M. R., Johnson, D. S.: Computers and Intractability A Guide to NP-
Completeness. Freeman, San Francisco, 1979
13. Goldberg, D. E., Lingle, R.: Alleles, loci, and the traveling salesman problem. In Intl
Conference on Genetic Algorithms, pages 154-159, 1985
14. Holland, J. H.: Adaption in Natural and Artificial Systems. The University of Michigan
Press, Ann Arbor, MI, 1975
15. Iman, S., Pedram, M.: Multilevel network optimization for low power. In Intl Conf. On
CAD, pages 372-377, 1994
16. Mazumder, P., Rudnick, E.: Genetic Algorithms for VLSI Design, Layout & Test Automa-
tion. Prentice Hall, 1998
17. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. 3rd edn.
Springer-Verlag, Berlin Heidelberg New York (1996)
18. Murgai, R., Fujita, M., Krishnan, S. C.: Data sequencing for minimum-transition transmis-
sion. In IFIP Intl Conf. on VLSI, 1997
19. Murgai, R., Fujita, M., Oliveira, A.: Using complementation and resequencing to minimize
transitions. In Design Automation Conf., pages 694-697, 1998
20. Oliver, I. M., Smith, D. J., Holland, J. R. C.: A study of permutation crossover operators
on the traveling salesman problem. In Intl Conference on Genetic Algorithms, pages 224-
230, 1987
21. Shen, W.-Z., Lin, J.-Y., Wang, F.-W.: Transistor reordering rules for power reduction in
CMOS gates. In ASP Design Automation Conf., pages 1-6, 1995
22. Stan, M., Burleson, W.: Limited-weight codes for low-power I/O. Intl Workshop on Low
Power Design, 1994
23. Tiwari, V., Malik, S., Wolfe, A., Lee, M.: Power analysis of embedded software: A first
step towards software power minimization. In Intl Conf. on CAD, pages 384-390, 1994
24. Tiwari, V., Malik, S., Wolfe, A., Lee, M.: Instruction level power analysis and optimiza-
tion software. In VLSI Design Conf., 1996
25. Tsui, C., Pedram, M., Despain, A. M.: Technology decomposition and mapping targeting
low power dissipation. In Design Automation Conf., pages 68-73, 1993
26. Vaishnav, H., Pedram, M.: PCUBE: A performance driven placement algorithm for low
power design. In European Design Automation Conf., pages 72-77, 1993
27. Whitley, D., Starkweather, T., Fuquay, D.: Scheduling problems and traveling salesman:
The genetic edge recombination operator. In Intl Conference on Genetic Algorithms,
pages 133-140, 1989
GRACE: Generative Robust Analog Circuit
Exploration
1 Introduction
With our Generative Robust Analog Ciruit Exploration (GRACE) tool we are
investigating whether it is possible to evolve circuits that can be realized e-
ciently and in a routine manner. We are focusing upon the domain of analog
circuit design. Our decision is motivated by the lack of automation in analog de-
sign as compared to digital design. We intend to investigate whether evolvable
hardware (EHW) approaches can contribute to the complex, human-intensive
activity domain of analog design.
The goal of this paper is to describe how we arrived at GRACE. By com-
bining the exploitation of coarse grained elements with intrinsic testing, we
think GRACE sits in an interesting and novel space. It allows a distinctive
foray into on-line adaptive and fault tolerant evolvable hardware circuits since
it uses a COTS (Commercial-O-The-Shelf) device and standard components.
This should make its results more acceptable to industry. It also allows an
economical and time ecient foray into the broad domain of VLSI and CAD
with its use of elements that are conversant with human design. We proceed
thus: In Section 2 we describe how we reached a decision to select the Anadigm
AN221E04 as GRACEs recongurable device. In Section 3 we give an overview
of GRACE. In Section 4 we describe GRACEs genetic representation of a cir-
cuit and its search algorithms. In Section 5 we design a controller for a plant
using GRACE to demonstrate its ability to evolve circuits. We conclude with a
summary.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 332343, 2006.
c Springer-Verlag Berlin Heidelberg 2006
GRACE: Generative Robust Analog Circuit Exploration 333
FPTAs are well suited for the exploration of non-conventional realms of circuit
design such as polymorphic circuits, extreme temperature electronics and fault
tolerant circuits [5, 6, 7]. However, they are not well suited to our desire to explore
robust, novel topologies of interpretable and portable circuits.
Circuit synthesis with opamps has straightforward and methodical design
rules (which can be easily incorporated in the evolutionary algorithm) to ensure
334 M.A. Terry et al.
that the evolved circuit is interpretable and robust. The IsPAC10 and Anadigm
FPAA have circuit elements based on opamps. The IsPAC10, see Figure 1 right,
consists of 4 programmable analog modules (4 opamps, and 8 input ampliers
total) interconnected with programmable switching networks. Conguration of
the IsPAC10 is a proprietary process [2] . The Anadigm AN221E04, see Fig-
ure 1 left (and described in more detail in Section 2.1), also provides opamp
based circuits as building blocks. It uses switched-capacitor technology which is
inherently robust and portable.
With respect to Criterion 2, the cost of an FPTA is beyond $5K. The de-
velopment board of an IsPAC10 or Anadigm AN221E04 has a cost in the low
hundreds of dollars. Integrated with a conventional computer and other signal
processing devices, they facilitate a system with cost below $5K.
Wth respect to Criterion 3, each device we assessed oers a dierent level
of circuit element granularity. The FPTAs are very exible, ne-grained devices
The U. of Heidelbergs FPTA ([8], henceforth called FPTA-H) is a switched
network of 256 (16 X 16) programmable CMOS transistors (half NMOS and
half PMOS) arranged in a checkerboard pattern.
The FPTA-1 designed at JPL ( [4, 9]) is composed of 12 cells, where each cell
contains 8 CMOS transistors interconnectable via 24 switches. The transistors
are xed size and the switches are electronically programmable. The FPTA-1
appears to have been a prototype device for FPTA-2. The FPTA-2 ([1]) contains
an 8X8 matrix of 64 recongurable cells, where each cell consists of 14 transistors
interconnectable via 44 switches. The transistors are xed size. Each cell also
GRACE: Generative Robust Analog Circuit Exploration 335
For detailed description of the Anadigm Vortex family of devices, see [3].
FPAA
serial Controller PLANT
Evolutionary Configuration SRAM
Algorithm
DAQ
Vreft
- Configured
Controller
Vin
Vout
The Anadigm AN221E04 is congured by the EA via the serial port. The EA
sends inputs to the hardware and extract outputs via National Instruments PCI-
6221 multifunction data acquisition card (DAQ). (The PCI-6221 DAQ board
provides up to 80 analog inputs and 4 analog outputs giving GRACE scalabil-
ity). The DAQ provides both analog to digital and digital to analog conversion
with 16-bit resolution (for a voltage range of -10V to 10V). The reference signal
to the testbench is specied by the algorithm to the DAQ as a digital waveform.
The DAQ converts it to an analog signal and sends it to the testbench. Simul-
taneously, the DAQ converts the plants analog output signal to a digital signal
for the evolutionary algorithm to compare with the reference signal. Our sys-
tem actually duplicates the reference signal sent to the controller to be matched
with the plant output. This yields a time synchronized comparison between the
reference and plant signals.
338 M.A. Terry et al.
2 Type: sumDiff2*
I1 Parms: [0.05,0., 0.0, 0.01]
2 -G Options:{1,0,1,1}
I2 Inputs [I1,I2]
-G
3 C
Type: gain_inv
Parms[1.1,0.4,0.0,0.1]
3
I2 Options:{0,0,1,1}
Inputs [2]
Fig. 3. Left: A circuit in GRACE is a graph. Nodes are components and edges are
wires. Right: This graph is stored in a xed length linear genome. Each object of the
genome is a structure describing component, options, parameters and inputs.
GRACE: Generative Robust Analog Circuit Exploration 339
Each element of the vector is a structure which species a CAM, its options,
parameters and input source(s). Each instance of a CAM has a variable number
of programmable options and parameters. For example, the SumDi CAM has 4
options and 2 gain parameters while the simple Half cycle Gain Stage has only
2 options and 1 gain parameter. The genome stores in each structure another
two vectors of data that the genome-to-circuit translation process interprets as
parameters and options. Each vector is a xed length. If the parameters and
attributes of the CAM are fewer than the vector length, the extra values are
ignored. Like the redundant nodes and links of the circuit which do not connect
input to output, this redundant information is maintained in the genome.
The encoding of coarse grained components in the genome makes GRACE
reminiscent of Kozas genetic programming tree representation, e.g. in [14]. The
obvious dierence is the cyclic graph versus the tree. Another dierence is the
genome length xed in GRACEs case and variable in Kozas. The physical
limitation of resource quantities on the device demand that GRACE not evolve
a genome that requires more resources than on the device. This is ensured by
the xed length genome and by the decoding algorithm that maps the genome
to a series of build commands. The decoding algorithm makes use of a resource
manager to account for resources that will be used on the device as it translates
the genome into build commands. If it ever encounters a CAM (i.e. node)
for which the resources cannot be allocated, it replaces this node with a wire.
GRACEs genome is also inuenced by Millers Cartesian Genetic Programming
(CGP), [15]. The CGP genome is also a graph mapped to a matrix of varying
component with links between and among columns.
The search algorithms: We use the standard generation based processing loop
of an EA to conduct topology search. At initialization, a population of random
genomes is created. Each genome is mapped to a circuit topology with each
instance of a CAM specied using its input list, options and parameter values.
Serially each genome is congured on the device and given a test signal. The re-
sulting output signal is captured and evaluated in comparison to a desired output
signal. The error is mapped to a genome tness. After the entire current gen-
eration is tested, tournament selection supplies parents for the next generation.
Each parent is copied to create an ospring in the next generation. Ospring are
mutated before being added to the population of the next generation. Mutation
can be applied in two ways to the genome: to a CAM instance by changing its
type and, to the input(s) of a CAM by changing a link in the graph.
Given a topology, nding the parameters of the CAM is a numerical optimiza-
tion problem. Recently, Particle Swarm Optimization (PSO) has emerged as an
ecient approach to do fast numerical optimization to nd the global optimum
[16]. We use PSO to set the parameters of CAMs rather than evolving it together
with the topology by the EA. We believe that performing the steps of topology
search and component optimization separately makes the problem more tractable
for the EA. These two steps of topology search using an EA and component op-
timization using PSO can be combined in various ways which shall eect the ef-
340 M.A. Terry et al.
ciency and speed of the algorithm. For the current set of experiments, we run
PSO on each individual in the EA population and assign the best of swarm t-
ness to the individual. Intuitively, this approach assigns the topology tness ac-
cording to its best performance given the most suitable parameter values. Other
approaches which trade speed for eciency and vice-versa are under study.
Fitness Function: Though a simple step function would seem to be all that is
required to evaluate a controller, we used a more complex signal to ensure that
GRACE did not evolve a signal generator regardless of the input. The signal and
a candidate circuits response is shown on the right in Figure 4. The signal has
six voltage levels (-1.5V, -0.75V, -0.375, 0.375, 0.75V. and 1.5V) and changes
state every 4.16ms. We sampled the signal at 125 KHz. The tness of a circuit
is the weighted sum of squared errors between the circuits output signal and
the test signal. The tness function weights can be tuned to trade-o criteria of
settling time, peak overshoot and steady state error. For instance, more weight
to the error in latter part of the step response shall bias the search towards
controllers with lower steady state error and shall care less for rise time and
peak overshoot. For the current set of experiments, we used the time-weighted
least squares, which increases the weights linearly with time. It is postulated in
[17], such a tness function is ideal for judging the eciency of a controller.
Table 3. CAMs used in the GRACE Function Set for Controller Evolution. Asterisk
indicates output is connected to sample and hold block for two clock phase results.
CAM Parameter(s) # In
SumDi-2* inputs gain value(s) 2
SumDi-3* inputs gain value(s) 3
Inverting Dierentiator di. constant (us) 1
Integrator gain 1
Gain Inverter gain 1
Gain* gain 1
Wire 0 1
GRACE: Generative Robust Analog Circuit Exploration 341
Evolved Solutions: The system evolved solutions with high tness value (val-
idated by visual inspection of generated waveforms) that instantiate various
control strategies, for instance, proportional control, integral control or lossy
integral control. Evolution also found interesting ways to build solutions, like
use of a dierentiator in feedback to evolve a lossy integrator and using multiple
feedback to realize dierent gains (including high gain through positive feedback)
for proportional control.
Analysis of one of the best-of-run controller showed how evolution can think
out-of-the-box. Figure 5 shows the controller as seen in the Anadigm GUI on the
left and the equivalent simple block diagram on the right. Simple hand-analysis
shows that the solution is a lter. The summing-integrator lter topology is
a well-known approach to synthesize lters. It is counter-intuitive why a lter
would be a good controller. Evolution exploits the high integration-constant (of
the order of Mega per second) realizable by the integrator. It evolves a high
gain lter with a large bandwidth that has an integrator in both the forward
and feedback paths. This eectively behaves like proportional control with a
large gain. The high gain of the P-control reduces the steady-state error thus
contributing to high tness. This solution has not been included in discussion for
its usefulness in a real scenario, but due to illustrate the ability of algorithm to
Fig. 4. Left: Plant for evolved controller, Right: Fitness function test signal (square
wave) with example circuits output signal for the controller experiment
Fig. 5. Left: An evolved lter solution displayed from GUI , Right: Schematic of same
solution
342 M.A. Terry et al.
6 Summary
By combining the exploitation of coarse grained elements with intrinsic testing
on a COTS device, we think GRACE comprises a distinctive approach to analog
EHW. This papers goal has been to elucidate our decision process in engineering
GRACE. We feel our decision to use the Anadigm AN221E04 forges GRACEs
identity. It is a COTS rather than custom device. The proprietary nature of
its conguration process can be circumvented for practicality. It uses SRAM
to hold a conguration. This makes it suited as a component of an adaptive,
fault tolerant system. It exploits switched capacitor technology. This allows its
evolved designs to conform with industry specications and be realizable. This
will facilitate the ultimate placement of evolved circuits in the eld.
Finally, in using the Anadigm AN221E04, it oers coarse grained elements.
Coarse granularity makes GRACE contrast with FPTA approaches by exchang-
ing exibility with higher level building block abstraction. We think that GRACE
enables a parallel set of investigations that will provide interesting comparisons
between the non-linear design space of the FPTA and the human oriented, con-
ventional design space. We believe our choices additionally provide us with trac-
tion into both adaptive, robust hardware evolution and the more traditional
pursuit of analog CAD. This will be the direction of our future research using
GRACE.
Acknowledgements
We would like to thank Anadigm, Dimitri Berensen, Garrison Greenwood, David
Hunter, Didier Keymeulen, Adrian Stoica, and Eduardo Torres-Jara for their
contributions to the development of GRACE.
References
1. Stoica, A., Zebulum, R.S., Keymeulen, D.: Progress and challenges in building
evolvable devices. In: Evolvable Hardware. (2001) 3335
2. Greenwood, G., Hunter, D.: Fault recovery in linear systems via intrinsic evolution.
In Zebulum, R., Gwaltney, D., Hornby, G., Keymeulen, D., Lohn, J., Stoica, A.,
eds.: Proceedings of the NASA/DoD Conference on Evolvable Hardware, Seattle,
Washington, IEEE Computer Society (2004) 115122
GRACE: Generative Robust Analog Circuit Exploration 343
Abstract. Simple digital FIR filters have recently been evolved directly
in the reconfigurable gate array, ignoring thus a classical method based
on multiplyandaccumulate structures. This work indicates that the
method is very problematic. In this paper, the gate-level approach is
extended to IIR filters, a new approach is proposed to the fitness cal-
culation based on the impulse response evaluation and a comparison is
performed between the evolutionary FIR filter design utilizing a full set
and a reduced set of gates. The objective of these experiments is to show
that the evolutionary design of digital filters at the gate level does not
produce filters that are useful in practice when linearity of filters is not
guaranteed by the evolutionary design method.
1 Introduction
FIR (nite impulse response) lters and IIR (innite impulse response) lters
represent two important classes of digital lters that are utilized in many ap-
plications. For these lters, a rich theoretical understanding as well as practical
design experience have been gained in the recent decades [6]. Typically, their
implementation is based on multiplyandaccumulate structures (regardless on
software or hardware implementation). Alternative design paradigms (such us
multiplierless designs) have also been formulated [7].
With the development of real-world applications of evolutionary algorithms,
researchers have started to evolve digital lters. Miller has introduced probably
the most radical idea for their design [9, 10, 11]: In evolutionary design process,
target lters are composed from elementary gates, ignoring thus completely the
well-developed techniques based on multiplyandaccumulate structures. The
main practical potential innovation of this approach could be that the evolved
lters are extremely area-ecient in comparison with the standard approach. We
should understand this approach as a demonstration that the evolution is capable
to put some gates together in order to perform a very simple ltering task.
Denitely, the approach is not able to compete against the standard methods. A
similar approach has been adopted for functional level evolution of IIR lters [3].
In contrast to an optimistic view presented in the mentioned papers, this
work indicates that the approach is very problematic. In this paper, Millers
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 344355, 2006.
c Springer-Verlag Berlin Heidelberg 2006
On the Practical Limits of the Evolutionary Digital Filter Design 345
gate-level approach is extended to IIR lters, a new approach to the tness cal-
culation is proposed based on the impulse response evaluation and a comparison
is performed between the evolutionary FIR lter design utilizing a full set of
gates and reduced set of gates. The objective of these experiments is to support
the following hypothesis: Evolutionary design of digital lters at the gate level
does not produce lters that are useful in practice when linearity of lters is not
guaranteed. Two approaches that could ensure the linear behavior of evolved
lters will be discussed.
The rest of the paper is organized as follows. Section 2 introduces the area
of digital lter design and the use of evolutionary techniques in this area. In
Section 3, the proposed approach is described to the evolutionary design of FIR
and IIR lters. Results of experiments are reported in Section 4. Section 5 dis-
cusses the advantages and disadvantages of the proposed approach. Conclusions
are given in Section 6.
where h(k) is the impulse response of the system. The values of h(k) completely
dene the discrete-time system in the time domain.
A general IIR (innite impulse response) digital lter is described by equation
N
M
y(n) = bk x(n k) ak y(n k). (3)
k=0 k=1
The output samples y(n) are derived from current and past input samples
x(n) as well as from current and past output samples. Designers task is to
346 L. Sekanina and Z. Vascek
The stability and linear phase are main advantages of FIR lters. On the other
hand, in order to get a really good lter many coecients have to be considered in
contrast to IIR lters. In general, IIR lters are not stable (because of feedback).
FIR lters are algebraically more dicult to synthesize.
Various methods have been proposed to design digital lters (such as fre-
quency sampling method and window method for FIR lters and pole/zero
placement and bilinear z-transform for IIR lters). These methods are well de-
veloped and represent an approach to digital lter design widely adopted by
industry. Digital lters are usually implemented either on DSPs or as custom
circuits. Their implementation is based on multipliers and adders. The quality
of output signal, speed of operations and cost of hardware implementation are
important factors in the design of digital lters. The multiplier is the primary
performance bottleneck when implementing lters in hardware as it is costly in
terms of area, power and signal delay. Hence multiplierless lters were intro-
duced in which multiplication is reduced to a series of bitshifts, additions and
subtractions [12, 7].
Evolutionary algorithms have been utilized either to optimize lter coecients
[4] or to design a complete lter from chosen components. In particular, struc-
tures of multiplierless lters were sought by many authors [12, 5, 1]. As these
lters are typically composed of adders, subtracters and shifters (implementing
multiplication/division by the powers of two) they exhibit linear behavior for
the required inputs.
Miller has pioneered the evolutionary approach in which FIR lters are con-
structed from logic gates [9, 10, 11]. He has used an array of programmable gates
to evolve simple low-pass, high-pass and band-pass lters that are able to lter
simple sine waves and their compositions. The unique feature of these lters is
that they are composed of a few tens of gates; thus reducing the implementa-
tion costs signicantly in comparison with other approaches. The evolved lters
do not work perfectly and they are far from the practical use; however, Miller
has demonstrated that quasi-linear behavior can be obtained for some particular
problems. The gate arrays are carrying out ltering without directly implement-
ing a dierence equation an abstract model utilized for lter design. The tness
function can be constructed either in the frequency domain or time domain. In
both cases Miller has obtained similar results. However, he mentioned that: Ex-
perience suggests that gate arrays that are evolved using a tness function which
looks at the frequency spectrum of the circuit output appear to be more linear
in behavior than using an error based measure of tness [10].
On the Practical Limits of the Evolutionary Digital Filter Design 347
Recently Gwaltney and Dutton have utilized similar approach to evolve IIR
lters at the functional level (for 16bit data samples) [3]. Their lters are com-
posed of adders, multipliers and some logic functions; therefore, they are non-
linear. They have evolved low-pass IIR lters using a couple of components. The
lter fails to function properly when the input is changed to a signal that is
signicantly dierent from that used during evolution.
1 2 1
0 1 3 0 5 0 7
2
1
0 0
1 4 0 6 1 8
2 0
The main problem is to determine which signals should be included into the
training set. Ideally, all frequencies and shapes should be testes; however, it is
not tractable.
An alternative approach could be to apply the unit impulse (i.e. the signal that
contains all frequencies) at the input and to measure the impulse response. This
-1 -1 -1
z z z
w w w
-1 -1 -1
z z z
w w w
Reconfigurable circuit Reconfigurable circuit
(a) (b)
Fig. 2. CGP utilized to design (a) FIR filters and (b) IIR filters
-128 64 32 16 8 4 2 1
-1 1/2 1/4 1/8 1/16 1/32 1/64 1/128
0 0 0 1 0 0 0 1 17 0,133
0 0 0 1 0 0 0 1
1 1 1 0 1 1 1 1 -17 1 1 1 0 1 1 1 1 -0,133
(a) (b)
approach will be utilized to design IIR lters. The role of CGP is to nd such
the lter whose impulse response is as close as possible to the required impulse
response. We have to work with real numbers because the values of impulse
responses range from 1 to +1. In our case we will represent real numbers in
the fraction arithmetic (which is based on 2s complement encoding as shown in
Figure 3).
4 Results
The following CGP parameters represent the basic setup for experiments: u =
15, v = 15, L = 15, ni = 24, no = 8, population size 5, 15 million generations,
function set: F = {c = a, c = a and b, c = a or b, c = a xor b, c = not a, c =
not b, c = a and (not b), c = a nand b, c = a nor b}. CGP was implemented in
C++. The evolved lters were analyzed using Matlab.
a)
250 250 250
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
b)
250 250 250
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
Fig. 4. Behavior of the evolved low-pass filter: input signal (left), required output signal
(center), output signal (right); (a) training signal, (b) test signal
350 L. Sekanina and Z. Vascek
a)
250 250 250
obtained output
desired output
150 150 150
input
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
b)
250 250 250
obtained output
expected output
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
Fig. 5. Behavior of the evolved high-pass filter: input signal (left), required output
signal (center), output signal (right); (a) training signal, (b) test signal
Figure 4 shows the behavior of the best evolved lter. We utilized the signal
with f1 as a test signal and observed that the evolved circuit modies the signal
although it should transmit the signal without any change. Therefore, the circuit
cannot be understood as a perfect lter.
obtained output
desired output
150 150 150
input
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
obtained output
desired output
150 150 150
input
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
250 250 250
f + f
3 5
200 200 200
obtained output
desired output
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
250 250 250
f
2
200 200 200
obtained output
desired output
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
250 250 250
f
4
200 200 200
obtained output
desired output
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
50 50 50
0 0 0
20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120
n n n
Fig. 6. Behavior of the evolved low-pass filter II: input signal (left), required output
signal (center), output signal (right)
352 L. Sekanina and Z. Vascek
1 1
0.5 0.5
horig[n]
h[n]
0 0
0.5 0.5
1 1
20 40 60 80 100 120 20 40 60 80 100 120
n n
10 10
8 8
Magnitude
Magnitude
6 6
4 4
2 2
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized Frequency ( rad/sample) Normalized Frequency ( rad/sample)
(a) (b)
Fig. 8. The input signal (left) and the filtered signal obtained using the evolved filter
(right)
Table 1. Filters evolved using the complete set of gates and reduced set of gates
Fig. 9. Signal f4 = x(n): The outputs of filters evolved with the compelte set of gates
(EvoFilter 1) and reduced set of gates (EvoFilter 2)
of gates is more adapted to training signals. However, Figure 9 shows that the
output response is acceptable neither for scenario 1 nor 2.
5 Discussion
The common result of experiments performed herein, by Miller [9, 10, 11] and by
Gwaltney and Dutton [3] is that the evolved lters do not work when they are
required to lter signals dierent from training signals. Moreover, the evolved
lters do not generate perfect responses either for training signals. The evolved
circuits are not, in fact, lters. In most cases they are combinational circuits
trained on some data that are not able to generalize. In order to obtain real lters,
the design process must guarantee that the evolved circuits are linear. There are
two ways how to ensure that: (1) The circuit is composed of components that
are linear and the process of composition always ensures a linear behavoir. This
approach is adopted by many researchers (e.g., [1]) but not by the methods
discussed in this paper. (2) Linearity is evaluated in the tness calculation.
Unfortunately, that is practically impossible because all possible input signals
should be considered, which is intractable. Note that Millers tness function
[11] has promoted the lters exhibiting the quasi-linear behavior; however, it
does not guarantee (in principle) that a candidate lter is linear although the
lter has obtained a maximum tness score.
354 L. Sekanina and Z. Vascek
In this paper, we have evolved FIR as well as IIR lters and proposed
the approaches based on the impulse response and the reduced set of (linear)
gates. However, none of them have led to satisfactory results. On the basis of
experiments performed in this paper and results presented in [9, 10, 11], we
are claiming that the gate level evolutionary design of digital lters is not able
to produce lters useful in practice if the linearity is not guaranteed by the
evolutionary design process. As we do not know now how to ensure the linear
behavior, the approach should be considered as curious if one is going to design
a digital lter.
There could be some benets coming with this unconventional lter design.
It was shown that circuits can be evolved to perform ltering task when sucient
resources are not available (e.g. a part of chip is damaged) [5] or when some noise
in presented in input signals [10]. Furthermore, as Miller has noted, The origin
of the quasi-linearity is at present quite mysterious. . . . Currently there is no
known mathematical way of designing lters directly at this level. Possibly we
could discover novel design principles by analyzing the evolved circuits.
The evolutionary design is very time consuming. In order to produce 20 mil-
lions of generations with a ve-member population, the evolutionary design re-
quires 29.5 hours for IIR lter and 6 hours for FIR lter (on a 2.8 GHz processor).
6 Conclusions
In this paper, the gate-level approach to the digital lter design was extended to
IIR lters, a new approach was proposed to the tness calculation based on the
impulse response evaluation and a comparison was performed between the full
set of gates and reduced set of gates for the evolutionary FIR lter design. On
the basis of experiments performed herein and the results presented in literature
we have recognized that the gate level evolutionary design of digital lters is
not able to produce real lters. Therefore, this approach remains a curiosity
rather than a design practice.
Acknowledgment
The research was performed with the Grant Agency of the Czech Republic under
No. 102/04/0737 Modern Methods of Digital Systems Synthesis.
References
[1] Erba, M., et al.: An Evolutionary Approach to Automatic Generation of VHDL
Code for Low-Power Digital Filters. In: Proc. of the 4th European Conference on
Genetic Programming, LNCS 2038, Springer-Verlag, 2001, p. 3650
[2] Ganguly, N. et al. A survey on cellular automata. Technical report, Centre for
High Performance Computing, Dresden University of Technology, December 2003
[3] Gwaltney, D., Dutton, K.: A VHDL Core for Intrinsic Evolution of Discrete Time
Filters with Signal Feedback. In Proc. of 2005 NASA/DoD Conference on Evolv-
able Hardware, IEEE Comp. Society Press, 2005, p. 4350
On the Practical Limits of the Evolutionary Digital Filter Design 355
[4] Harris, S. P., Ifeachor, E. C.: Automating IIR filter design by genetic algorithm.
Proc. of the First IEE/IEEE International Conference on Genetic Algorithms in
Engineering Systems: Innovations and Applications (GALESIA95), No. 414, IEE,
London, 1995, p. 271275
[5] Hounsell, B. I., Arslan, T., Thomson, R.: Evolutionary design and adaptation of
high performance digital filters within an embedded reconfigurable fault tolerant
hardware platform. Soft Computing. Vol. 8, No. 5, 2004, p. 307317
[6] Ifeachor, E. C., Jervis, B. W.: Digital Signal Processing: A Practical Approach,
2nd edition, Pearson Education, 2002
[7] Martinez-Peiro, M., Boemo, E. I., Wanhammar, L.: Design of High-Speed Multi-
plierless Filters Using a Nonrecursive Signed Common Subexpression Algorithm.
IEEE Trans. Circuits Syst. II, Vol. 49, No. 3, 2002, p. 196-203
[8] Miller, J., Job, D., Vassilev, V.: Principles in the Evolutionary Design of Digital
Circuits Part I. Genetic Programming and Evolvable Machines, Vol. 1, No. 1,
2000, p. 835
[9] Miller, J.: Evolution of Digital Filters Using a Gate Array Model. In: Proc. of the
Evolutionary Image Analysis, Signal Processing and Telecommunications Work-
shop. LNCS 1596, Springer-Verlag, 1999, p. 121132
[10] Miller, J.: On the filtering properties of evolved gate arrays. In Proc. of
NASA/DOD Workshop on Evolvable Hardware, IEEE Comp. Society, 1999, p.
211
[11] Miller, J.: Digital Filter Design at Gate-level using Evolutionary Algorithms. In
Proc. of the Genetic and Evolutionary Computation Conference (GECCO99).
Morgan Kaufmann, 1999, p. 11271134
[12] Wade, G., Roberts, A., Williams, G.: Multiplier-less FIR filter design using a
genetic algorithm. IEE Proceedings in Vision, Image and Signal Processing, Vol.
141, No. 3, 1994, p. 175180
Image Space Colonization Algorithm
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 356367, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Image Space Colonization Algorithm 357
2 System Architecture
The system is based on an evolutionary algorithm which simulates the coloniza-
tion of a bidimensional world by a number of populations. The world is organized
in a bidimensional array of locations, or cells, where each cell is always occupied
by an individual.
The world is represented by a matrix, associated with a vector of input im-
ages Iz (i.e. RGB components, textural parameters, or whatever), which are
stacked one above the other. Each cell of the matrix corresponds to a pixel of
the image stack, and therefore, the cell having coordinates P = (x, y) is associ-
ated to a vector of features e(x, y) = {Iz (x, y)}. In our simulation, this feature
vector is assumed to represent the environmental conditions at point P of our
world.
During each generation, each individual has a variable probability Sr , de-
pending both on the environmental conditions and on the local neighborhood,
to survive to the next generation. When the individual fails to survive, the empty
cell is immediately occupied by a newly generated individual.
358 L. Bocchi and L. Ballerini
where, as described above, ei and i are, respectively, the mean and the standard
deviation of e(x, y) over all points of the image occupied by individuals belonging
to the population i.
where, as above, parameters nt and n0 describe the position and the steepness of
the sigmoid function, while the constant ms represents a minimal survival rate.
It is worth noting the dierence between the two survival rates is in the sign: in
this case the survival rate increases when ni increases, while Se decreases when
ci increases.
2.4 Algorithm
The algorithm can be described according to the following steps:
1. On each point in the image is placed a random individual
2. For each generation:
a. The average feature vector ei and its standard deviation i are computed
for each population
b. For each individual:
i. The survival probability is computed as Sr = Se Sn .
ii. If the individual does not survive, a new one replaces it. The new
individual is assigned to a population randomly selected with proba-
bilities proportional to the survival factor Se of an individual of each
population.
3. The separation sf among populations is evaluated, and split and merge
operation are performed
360 L. Bocchi and L. Ballerini
The under-segmentation cost C is an average measure of the extent to which
a region in the candidate segmentation overlaps with several regions of the target
segmentation, a pattern that violates condition (a) above. Among the sites that
are assigned to a region l in the candidate segmentation, we let ql (r) denote the
proportion of sites that belong to region r in the target segmentation. Ideally,
all ql (r) but one are equal to zero. By contrast, if l overlaps equally with all
regions of the target, then all ql (r) are equal. Therefore, the entropy of the
distribution ql is a measure of the contribution of l to under-segmentation. The
total under-segmentation cost C is dened as a weighted sum of individual
under-segmentations. The contribution of each candidate region l is weighted by
the proportion l of sites that have been attributed to l:
C = l ql (r) log ql (r) (6)
l r
Fig. 1. Example of simulated MR brain images. Left: original image. Right: ground
truth.
simulated with T1-weighted contrast, 1mm cubic voxels, 3% noise and no inten-
sity inhomogeneity. The simulation is based on an anatomical model of normal
brain, which can serve as the ground truth for any analysis procedure. The vol-
ume contains 181217181 voxels and covers the brain completely. A transverse
slice has been extracted from the volume. The non-brain parts of the image such
as bone, cortex and fat tissue has been rstly removed. For our experiments
we considered four classes, corresponding to Grey Matter (GM), White Matter
(WM), CerebroSpinal Fluid (CSF) and background (BG).
The behavior of our algorithm can be evaluated with respect to all parameters,
by comparing the output against the ground truth provided by the BrainWeb
Simulated Brain Database [10].
Figure 1 shows a slice from the simulated data set and the relative ground
truth.
0.6 0.6
0.58 0.58
0.56 0.56
0.54 0.54
0.52 0.52
cost
cost
0.5 0.5
0.48 0.48
0.46 0.46
0.44 0.44
0.42 0.42
0.4 0.4
17 18 19 20 21 22 23 7 8 9 10 11 12 13
c0 ct
0.6 0.6
0.58 0.58
0.56 0.56
0.54 0.54
0.52 0.52
cost
cost
0.5 0.5
0.48 0.48
0.46 0.46
0.44 0.44
0.42 0.42
0.4 0.4
0.7 0.8 0.9 1 1.1 1.2 1.3 3.5 4 4.5 5 5.5 6 6.5
n0 nt
The value of ct being xed (ct = 10), the average cost function has been
determined for c0 [18, 22]. Reciprocally, the value of c0 being xed (c0 = 20),
the average cost function has been determined for ct [8, 12].
Figure 2, on the left, describes the behavior of the average cost function with
the environmental parameter c0 . Accordingly to (1), parameter ct represents the
steepness of the sigmoid function which describes Se , where smaller values of
the parameter correspond to an steepest curve. The plot shows that the opti-
mal value is located approximately at c0 = 19. The plot on the right reports
the analysis of the behavior of the segmentation cost when the parameter ct is
varied. Accordingly to (1), the parameter ct represents the value of the weighted
dierence ci corresponding to a survival rate Se equal to 0.5. The plot shows
that the optimal value of ct for the given problem is ct = 11
In the second set of experiments, the value of nt being xed (nt = 4.5), the
average cost function has been determined for n0 [0.75, 1.25]. Reciprocally, the
value of n0 being xed (n0 = 1), the average cost function has been determined
for nt [4, 6].
Image Space Colonization Algorithm 363
Population A
Population B
Population C
Population D
Figure 3 shows the variation of the segmentation cost with respect to the pa-
rameters describing neighborhood inuence, namely n0 and nt . Both plots show
how each parameter has an optimal value, which corresponds to a minimum of
the segmentation cost. Comparing Figure 3 with Figure 2, the strongest inu-
ence of the neighborhood constraints over the segmentation cost can be noted,
as indicated by the presence of a deeper minimum in the plots in Figure 3.
This result suggests the high importance for the colonization strategy of the
presence of a neighborhood of individuals of the same type. When the value of
Sn is too low, the individuals tend to group together irrespective of the actual
environment of the image, as the only chance to survive is to form a compact
region of individual of the same type, in order to increase the value of Sn . In the
same way, when the values of Sn are too high, the grouping pressure diminishes,
and the segmentation strategy becomes highly sensitive to noise.
Intramuscular fat content in meat inuences some important meat quality pa-
rameters. For example, the quantitative intramuscular fat content has been
shown to inuence the palatability characteristics of meat. In addition, the vi-
sual appearance of the fat does inuence the consumers overall acceptability
of meat and therefore the choice when selecting meat before buying. Therefore
the aim of the present application was to quantify intramuscular fat content in
beef together with the visual appearance of fat in meat, and to compare the fat
percentage measured by image analysis with chemical and sensory properties.
Moreover the distribution of fat is an important criterion for meat quality eval-
uation and its expected palatability. Segmentation of meat images is the rst
step of this study.
The algorithm described in the previous section has been applied to meat
image in order to obtain a proper classication and perform subsequent analysis.
Color images of M. longissimus dorsi were captured by a Sony DCS-D700
camera. The same exposure and focal distance were used for all images. Digital
color photography was carried out with a Hama repro equipment (Germany).
Green color was used as background and photographs were taken on both sides
of the meat. The meat pieces were enlighten with two lamps, with two uorescent
tubes each (15 W). Polaroid lters were used on the lamps and on the camera.
Fig. 5. Digital camera image of the longissimus dorsi muscle from representative beef
meat (original is in color)
Image Space Colonization Algorithm 365
Images were 1344 x 1024 pixel matrices with a resolution of 0.13 x 0.13 mm (see
Figure 5, as an example).
Our algorithm has been applied to segment these images in three classes: fat,
muscle and background. The following parameters has been used: nt = 5, n0 = 1,
ct = 10, c0 = 20, min(sf ) = 1, me = 0.1, ms = 0.5.
Figure 6 shows the results obtained after 200 iterations.
Unfortunately this algorithm is not able to distinguish between fat and con-
nective tissue, as they have exactly the same color. A combination of the present
algorithm with a previous algorithm [11] could provide good results also as con-
cern the separation between fat and connective tissue. The percentage of fat
extracted by the method proposed in this work was compared to the percentage
measured by chemical analysis. We observed that advanced image analysis is
useful for approximate measures of intramuscular fat content, even if the per-
centage of fat is usually overestimated, probably due to that digital photographs
only reect the meat surface.
5 Conclusions
Acknowledgments
We would like to thank the meat-group at the Dept. of Food Science, Swedish
University of Agricultural Sciences, Uppsala, especially Professor Kerstin Lund-
strom and Maria Lundesjo Ahnstrom for providing the camera pictures and for
their friendly collaboration.
References
1. Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Pattern Recog-
nition 26 (1993) 12771294
2. Bhanu, B., Lee, S., Ming, J.: Adaptive image segmentation using a genetic algo-
rithm. IEEE Transactions on Systems, Man and Cybernetics 25 (1995) 15431567
3. Bhandarkar, S.M., Zhang, H.: Image segmentation using evolutionary computation.
IEEE Transactions on Evolutionary Computation 3 (1999) 121
4. Andrey, P.: Selectionist relaxation: Genetic algorithms applied to image segmen-
tation. Image and Vision Computing 17 (1999) 175187
5. Liu, J., Tang, Y.Y.: Adaptive image segmentation with distributed behavior-based
agents. IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999)
544551
Image Space Colonization Algorithm 367
6. Veenman, C.J., Reinders, M.J.T., Backer, E.: A cellular coevolutionary algorithm for
image segmentation. IEEE Transactions on Image Processing 12 (2003) 304313
7. Ramos, V., Almeida, F.: Articial ant colonies in digital image habitats - a mass
behaviour eect study on pattern recognition. In: Proc. of ANTS2000 - 2nd Int.
Workshop on Ant Algorithms (From Ant Colonies to Articial Ants), Brussels,
Belgium (2000) 113116
8. Gardner, M.: The fantastic combinations of John Conways new solitaire game
life. Scientican American 223 (1970) 120123
9. Bocchi, L., Ballerini, L., Hassler: A new evolutionary algorithm for image segmen-
tation. In: Application of Evolutionary Computation. Number 3449 in Lectures
Notes in Computer Science, Lausanne, Switzerland (2005) 264273
10. Collins, D.L., Zijdenbos, A.P., Kollokian, V., Sled, J.G., Kabani, N.J., Holmes,
C.J., Evans, A.C.: Design and construction of a realistic digital brain phantom.
IEEE Transactions on Medical Imaging 17 (1998) 463468
11. Ballerini, L.: Genetic snakes for color images segmentation. In: Application of
Evolutionary Computation. Volume 2037 of Lectures Notes in Computer Science.,
Milan, Italy (2001) 268277
Enhancement of an Automatic Fingerprint
Identification System Using a Genetic Algorithm
and Genetic Programming
Abstract. This paper presents the use of a genetic algorithm and ge-
netic programming for the enhancement of an automatic ngerprint iden-
tication system (AFIS). The recognition engine within the original sys-
tem functions by transforming the input ngerprint into a feature vector
or ngercode using a Gabor lter bank and attempting to create the best
match between the input ngercode and the database ngercodes. A de-
cision to either accept or reject the input ngerprint is then carried out
based upon whether the norm of the dierence between the input nger-
code and the best-matching database ngercode is within the threshold
or not. The ecacy of the system is in general determined from the com-
bined true acceptance and true rejection rates. In this investigation, a
genetic algorithm is applied during the pruning of the ngercode while
the search by genetic programming is executed for the purpose of creat-
ing a mathematical function that can be used as an alternative to the
norm operator. The results indicate that with the use of both genetic al-
gorithm and genetic programming the system performance has improved
signicantly.
1 Introduction
Biometrics is an automated technique for identifying individuals based upon
their physical or behavioural characteristics. The physical characteristics that
are generally utilised as biometrics cover faces, retinae, irises, ngerprints and
hand geometry while the behavioural characteristics that can be used include
handwritten signatures and voiceprints. Among various biometrics, ngerprint-
based identication is the most mature and proven technique. A ngerprint is
made up from patterns of ridges and furrows on the surface of a nger [1]. The
uniqueness of a ngerprint can be explained via (a) the overall pattern of ridges
and furrows and (b) the local ridge anomalies called minutiae points such as a
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 368379, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Enhancement of an AFIS Using a GA and GP 369
ridge bifurcation and a ridge ending. As ngerprint sensors are nowadays get-
ting smaller and cheaper, automatic ngerprint identication systems (AFISs)
have become popular alternatives or complements to traditional identication
methods. Examples of applications that have adopted an AFIS are ranging from
security control with a relatively small database to criminal identication with
a large database.
Research in the area of ngerprint-based identication can be divided into
two categories: ngerprint classication and ngerprint recognition. The pur-
pose of classication is to cluster a database of ngerprints into sub-categories
where the sub-categories are in general dened according to a Henry sys-
tem [2]. Several techniques including syntactic approaches [3, 4], structural ap-
proaches [5, 6, 7, 8, 9], neural network approaches [4, 10, 11, 12] and statistical ap-
proaches [13] have been successfully used in ngerprint classication. In contrast,
the purpose of recognition is to match the ngerprint of interest to the identity
of an individual. A ngerprint recognition system is widely used in security-
related applications including personnel identication and access control. For
the purpose of access control, the goal of recognition is (a) to identify correctly a
system user from the input ngerprint and grant him or her an appropriate access
and (b) to reject non-users or intruders. Various techniques including conven-
tional minutiae-based approaches [14, 15, 16], evolutionary minutiae-based ap-
proaches [17, 18, 19] and texture-based approaches [20] have been applied to n-
gerprint recognition. Among these techniques, the approach involving the trans-
formation of a ngerprint into a ngercode [20] has received much attention in
recent years. In brief, a ngerprint is transformed via a Gabor lter-based al-
gorithm where the resulting feature vector or ngercode is a xed length string
that is capable of capturing both local and global details in a ngerprint. The n-
gerprint recognition is then achieved by matching the ngercode interested with
that in the database via a vector distance measurement. Since the ngerprint is
now represented by a unique xed length vector and the matching mechanism is
carried out through a vector operation, this approach has proven to be reliable,
fast and requiring a small database storage.
Although a number of impressive results have been reported in Jain et al. [20],
the recognition capability of the ngercode system can be further enhanced. One
possible approach to improve the system is to modify the ngercode using a fea-
ture pruning technique. In most pattern recognition applications, the original
feature vector is often found to be containing a number of redundant features.
Once these features are removed, the recognition ecacy is in general main-
tained or improved in some cases. The most direct advantage for pruning the
ngercode is the reduction in the database storage requirement. The candidate
technique for pruning the ngercode is a genetic algorithm [21] where the decision
variables indicate the presence and absence of features while the optimisation
objective is the recognition ecacy. In addition to the feature pruning approach,
the recognition system can also be improved by modifying the ngercode match-
ing mechanism. In the original work by Jain et al. [20], a vector distance between
the input ngercode and the database ngercode is used to provide the degree
370 W. Wetcharaporn, N. Chaiyaratana, and S. Huvanandana
of matching. As a result, the distance value from each feature will contribute
equally to the judgment on how well two ngercodes match one another. In this
investigation, the mathematical structure for obtaining the distance and the level
of contribution from each feature will be manipulated and explored using a ge-
netic programming technique [22]. This part of the investigation is carried out
in order to further increase the recognition capability of the system from that
achieved after the feature pruning.
The organisation of this paper is as follows. In section 2, a brief explanation
on the original ngercode system will be given. This also includes the description
of the ngercode, which is the feature vector, and the matching mechanism. The
application of the genetic algorithm on the feature pruning and the results will
be discussed in section 3. Following that, the use of the genetic programming
in matching mechanism modication and the results will be given in sections 4
and 5. Finally, the conclusions are drawn in section 6.
2 Fingercode System
The ngercode system developed by Jain et al. [20] consists of two major stages:
lter-based feature extraction and ngercode matching stages. These two com-
ponents are explained as follows.
(a) (b)
Fig. 1. (a) The reference axis (b) the reference point () and the region of interest,
which consists of 128 sectors
Set 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(Original)
Set 2 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Set 3 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Set 4 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Set 5 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12
Set 6 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10
Set 7 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8
Set 8 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6
Set 9 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4
Set 10 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2
Table 2. Validation results of the ngercode system with and without feature pruning.
The rst validation set is also used during all repeated runs of the genetic algorithm.
Table 3. False acceptance and false rejection rates of the ngercode system with and
without feature pruning
From Tables 2 and 3, it can be clearly seen that the use of a pruned or
reduced ngercode leads to an improvement in recognition performance over
the use of a full ngercode for at least 7% in overall. The highest improvement
comes from the case of reduced ngercode obtained after using a majority vote
rule where detailed results indicate that there is a signicant improvement in
the false rejection rate. On the other hand, the reduced ngercodes that have
the worst performance are the ones resulted from the use of AND and OR
functions. These results can be interpreted as follows. With the application of
a majority vote rule in deciding whether a feature should be maintained or
removed from the ngercode, the eect of uncertainties due to the stochastic
search nature of genetic algorithms on the overall optimisation result would
be minimised. During each genetic algorithm run, the search is conducted in a
manner that maximises the recognition ecacy. Since the search is a stochastic
one and there may be more than one globally optimal reduced ngercode, the
use of a majority vote rule would help maintaining necessary features detected in
most or all runs while at the same time eliminating possible redundant features.
This reason is supported by the results where AND and OR functions are used,
which indicate that there is no signicant gain in recognition performance over
the use of a reduced ngercode obtained from typical genetic algorithm runs.
With the application of an OR function, the resulting ngercode would contain
both necessary features and some redundant features while with the use of an
AND function, some crucial features may be left out since they are not present
in all best individuals. These two phenomena would have caused a reduction in
the recognition performance.
Enhancement of an AFIS Using a GA and GP 375
Similar to the approach presented in the previous section, ten databases are
also used during the validation where the results are displayed in Table 5. The
genetic programming results are produced using the best individual among all
ten runs. From Table 5, it can be clearly seen that the replacement of 1-norm by
the GP-generated function leads to a further improvement in terms of the recog-
nition ecacy, false acceptance rate and false rejection rate from that achieved
earlier. This also implies that the use of mathematical functions other than a
norm function may be more suitable to the ngercode system. It is noticeable
that the system performance is highest in the case of the rst validation set,
376 W. Wetcharaporn, N. Chaiyaratana, and S. Huvanandana
Table 5. Validation results of the reduced-feature ngercode system with the use of
1-norm and GP-generated function during matching
where it is also the data set used during the evolution of a matching function
using genetic programming.
Table 6. Validation results of the reduced-feature ngercode system with the use of
GP-generated function and combined function during matching
ter setting for the genetic programming is similar to that given in Table 4 except
that the maximum tree depth is reduced to seven and the number of generations
is decreased to 300. Furthermore, the ngercode features, which are parts of GP
terminals, come from only one code-group and are extracted from the reduced
ngercode previously selected by a genetic algorithm together with a majority
vote. The remaining features in the reduced ngercode are thus used as inputs
to the 1-norm operator.
Ten previously described databases are utilised during the validation where
the results are displayed in Table 6. The illustrated results are also produced
using the best individual among all ten runs. From Table 6, it can be clearly
seen that there is a slight drop in the recognition performance after replacing
the original GP-generated function with the combined function. The main cause
for this phenomenon is a slight increase in the false rejection rate in the results
from most databases. Nonetheless, the results from the combined function are
still better than that obtained using the 1-norm function in both false acceptance
and false rejection rates. In addition, the GP tree is now less complex since both
tree size and size of terminal set are signicantly reduced.
6 Conclusions
whether the norm of the dierence between the input ngercode and the best-
matching ngercode from the database exceeds the preset threshold or not. The
ecacy of the ngercode is measured in terms of the combined true acceptance
and true rejection rates. Two approaches for system enhancement have been
proposed. In the rst approach, the full feature vector or ngercode is pruned
using a genetic algorithm. After the elimination of redundant features, the use
of reduced ngercode is proven to help improving the system performance. In
the second approach, the calculation of norm during the ngercode matching
procedure is either partially or fully replaced by a mathematical function or
operation evolved by genetic programming (GP). With the use of features from
the reduced ngercode as parts of the terminal set, the GP-evolved function in
the recognition engine is proven to be highly ecient. As a result, the recognition
capability of the system is further increased from that achieved earlier by pruning
the ngercode. This helps to prove that the use of both genetic algorithm and
genetic programming can signicantly improve the recognition performance of
the ngercode system.
Acknowledgements
This work was supported by the Thailand Research Fund (TRF) through the
Research Career Development Grant and the National Science and Technology
Development Agency (NSTDA) through the Thailand Graduate Institute of Sci-
ence and Technology (TGIST) programme.
References
1. Galton, F.: Finger Prints. Macmillan, London, UK (1892)
2. Henry, E.R.: Classication and Uses of Finger Prints. HM Stationary Oce, Lon-
don, UK (1905)
3. Moayer, B., Fu, K.S.: A tree system approach for ngerprint pattern recognition.
IEEE Transactions on Pattern Analysis and Machine Intelligence 8(3) (1986) 376
387
4. Blue, J.L., Candela, G.T., Grother, P.J., Chellappa, R., Wilson, C.L.: Evaluation
of pattern classiers for ngerprint and OCR applications. Pattern Recognition
27(4) (1994) 485501
5. Rao, T.C.M.: Feature extraction for ngerprint classication. Pattern Recognition
8(3) (1976) 181192
6. Hrechak, A.K., McHugh, J.A.: Automated ngerprint recognition using structural
matching. Pattern Recognition 23(8) (1990) 893904
7. Karu, K., Jain, A.K.: Fingerprint classication. Pattern Recognition 29(3) (1996)
389404
8. Hong, L., Jain, A.K.: Classication of ngerprint images. In: Proceedings of
the 11th Scandinavian Conference on Image Analysis, Kangerlussuaq, Greenland
(1999)
9. Cho, B.-H., Kim, J.-S., Bae, J.-H., Bae, I.-G., Yoo, K.-Y.: Core-based ngerprint
image classication. In: Proceedings of the 15th International Conference on Pat-
tern Recognition, Barcelona, Spain (2000) 859862
Enhancement of an AFIS Using a GA and GP 379
10. Mitra, S., Pal, S.K., Kundu, M.K.: Fingerprint classication using a fuzzy multi-
layer perceptron. Neural Computing and Applications 2(4) (1994) 227233
11. Halici, U., Ongun, G.: Fingerprint classication through self-organizing feature
maps modied to treat uncertainties. Proceedings of the IEEE 84(10) (1996) 1497
1512
12. Jain, A.K., Prabhakar, S., Hong, L.: A multichannel approach to ngerprint clas-
sication. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(4)
(1999) 348359
13. Coetzee, L., Botha, E.C.: Fingerprint recognition in low quality images. Pattern
Recognition 26(10) (1993) 14411460
14. Farina, A., Kovacs-Vajna, Z.M., Leone, A.: Fingerprint minutiae extraction from
skeletonized binary images. Pattern Recognition 32(5) (1999) 877889
15. Fan, K.C., Liu, C.W., Wang, Y.K.: A randomized approach with geometric con-
straints to ngerprint verication. Pattern Recognition 33(11) (2000) 17931803
16. Tan, X., Bhanu, B.: Robust ngerprint identication. In: Proceedings of the 2002
International Conference on Image Processing, Rochester, NY (2002) I-277I-280
17. Tan, X., Bhanu, B.: Fingerprint matching by genetic algorithms. In: Late Breaking
Papers at the 2002 Genetic and Evolutionary Computation Conference, New York,
NY (2002) 435442
18. Tan, X., Bhanu, B.: Fingerprint verication using genetic algorithms. In: Proceed-
ings of the Sixth IEEE Workshop on Applications of Computer Vision, Orlando,
FL (2002) 7983
19. Tan, X., Bhanu, B.: Fingerprint matching by genetic algorithms. Pattern Recog-
nition 39(3) (2006) 465477
20. Jain, A.K., Prabhakar, S., Hong, L., Pankanti, S.: Filterbank-based ngerprint
matching. IEEE Transactions on Image Processing 9(5) (2000) 846859
21. Goldberg, D.E.: Genetic Algorithms: In Search, Optimization and Machine Learn-
ing. Addison-Wesley, Reading, MA (1989)
22. Koza, J.R.: Genetic Programming: On the Programming by Computers by Means
of Natural Selection. MIT Press, Cambridge, MA (1992)
23. Daugman, J.G.: High condence visual recognition of persons by a test of statistical
independence. IEEE Transactions on Pattern Analysis and Machine Intelligence
15(11) (1993) 11481161
Evolutionary Singularity Filter Bank Optimization
for Fingerprint Image Enhancement
1 Introduction
AFIS (Automatic Fingerprint Identification System) finds out whether an individuals
incoming fingerprint image is the same as any of the target templates in the database.
Classification, which groups fingerprints into predefined several categories based on
the basis of global ridge pattern, can reduce the number of fingerprints for matching
in the large database. Singular points are conceptualized as aid for fingerprint
classification by Henry [1], and used for better identification [2]. The accuracy of
singularity extraction is subject to the quality of images, so that it is hard to achieve
good performance. There are lots of methods for singularity extraction but also the
improvement of the quality [3].
Segmentation [4] and enhancement [5] to improve the quality of fingerprint
images. Segmentation classifies a part of the image into a background or fingerprint
region, so as to discard the background regions to reduce the number of false features
extracted. Enhancement improves the clarity of the ridge structures to extract correct
features. There are many filters for enhancement. For fingerprint images, enhance-
ment using various filters together might be better than when using only one, but
usually it requires the expert knowledge to determine the type and order of filters
[5,6,7,8].
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 380 390, 2006.
Springer-Verlag Berlin Heidelberg 2006
Evolutionary Singularity Filter Bank Optimization 381
2 Related Work
2.1 Singularity
Fingerprints contain singularities that are known as core and delta points. The core is
the topmost point on the innermost recurving ridge, while the delta is the center of a
triangular region where three different direction flows meet. In the process of
fingerprint classification, singularity is used directly [13], or with ridge patterns
[14,15]. It was also used as landmarks for the other features [16,17]. There are other
applications of singularity such as landmarks for fingerprint matching [18].
The Poincare index is a popular method to detect singularity using orientation field
[3]. Fig. 1 shows the orientation field of core and delta regions. The orientation field is
estimated for each block, and it is subject to the quality of the image. Image enhance-
ment to improve the quality is required to calculate the orientation field correctly.
For the correct feature extraction, the quality of the image should be improved by
using appropriate image filters. The number of constructing an ordered subset of n
n
filters from a set of m filters is given by m . Trying all cases to find out the best one
practically impossible when there are lots of filters available. In this paper, a genetic
algorithm is used to search filters of the proper type and order.
Fig. 3 shows the procedure of the proposed method which also presents the process
of evaluating fitness. In each generation, the fitness of chromosome is evaluated by
using the fitness function, and chromosomes with higher fitness are stochastically
selected and applied with genetic operators such as crossover and mutation to
reproduce the population of the next generation. Elitist-strategy [20] that always
keeps the best chromosome found so far is used. Chromosomes are represented as
simple numbers corresponding with individual filters, and Table 1 shows the type and
effect of 71 individual filters.
Chromosomes with the length of five represent a set of filters. Fig. 4 shows the
structure of chromosomes and the examples of genetic operators such as crossover
and mutation.
The fitness of an individual is estimated by the performance of singularity extraction
using the Poincare index. The Poincare index extracts singularity using the orientation
field which is calculated for each block. Singularity is classified into the core and delta,
and these two points do not lie together in the same location. Let Sd = {sd1, sd2, , sdn}
be the set of n singularity points detected by the singularity extraction algorithm, and Se
= {se1, se2, , sem} be the set of m singularity points identified by human experts in an
input fingerprint image. The following sets are defined.
Evolutionary Singularity Filter Bank Optimization 383
Paired singularity (p): A set of the singularity points that sd and se are paired if
sd is located within the tolerance (20 pixels) centered around se.
Missing singularity (a): A set of the points that are located within the tolerance
distance from singularity se but not singularity sd, which means that the singu-
larity extraction algorithm cannot detect the point.
Spurious singularity (b): A set of the points that are located within the tolerance
distance from singularity sd but not singularity se, which is detected by the sin-
gularity extraction algorithm, but not real singularity.
The missing rate of singularity is estimated by the equation (1), the spurious rate is
estimated by the equation (2), and the accuracy rate is estimated by the equation (3)
with N samples.
N
n(ai )
i =1
P(a) = N (1)
n( S ei )
i =1
N
n(bi )
i =1
P(b) = N (2)
n( S di )
i =1
N
(n(ai ) + n(bi ))
i =1
P( p ) = 1 N (3)
(n( S di ) + n( S ei ))
i =1
The accuracy of singularity is used as the fitness function of the genetic algorithm,
where an individual that shows better enhancement performance obtains a higher
score.
Evolutionary Singularity Filter Bank Optimization 385
4 Experiments
The NIST Special Database 4 was used [21] to verify the proposed method. The NIST
DB 4 consists of 4000 fingerprint images (2000 pairs). In the experiments, the train-
ing set was composed of the first 2000 images, f0001~f2000, and the test set consisted
of the other 2000 images, s0001~s2000. In the database, 7308 singularities are manually
marked by human experts, including 3665 on the training set and 3642 on the test set.
Table 2 shows the initial values of parameters in the experiment. The 40th generation
results in a rise of 0.02 for the maximum fitness and a rise of 0.3 for the average fitness
of the population which is 30 individuals. Fig. 5 shows the change of maximum and
average fitnesses in each generation, where the maximum fitness increases steadily and
the average fitness also shows a rise.
Parameter Value
Generation 40
Population 30
Chromosome length 5
Selection rate 0.7
Crossover rate 0.7
Mutation rate 0.05
Elitist-strategy Yes
Table 3. The number and type of image filters used in constructing a set of filters
genera-
Filter #, type
tion
Lowpass Median Erosion Highpass
0 11 25 0 NULL 30 18
33 #2 33 #3 33 #1 33 #3
Lowpass Median Gabor Lowpass Median
6 11 25 70 11 27
33 #2 33 #3 filter 33 #2 13
Highpass Median Gabor Lowpass Median
7 19 25 70 11 27
33 #4 33 #3 filter 33 #2 13
Highpass Median Gabor Lowpass Dilation
8 19 25 70 11 46
33 #4 33 #3 filter 33 #2 55 #4
Gaussian Median Gabor Gabor Closing
28 14 25 70 70 68
33 33 #3 filter filter 13
Gaussian Median Gabor Gabor Lowpass
30 14 25 70 70 12
33 33 #3 filter filter 55
Gaussian Median Gabor Gabor Dilation
35 14 25 70 70 48
33 33 #3 filter filter 33
Table 3 shows filters obtained through the evolution. The Gabor filter is included
in all the filters except for that of randomly initialized. These obtained filters are
divided into the front part and rear part by the Gabor filter. The Gabor filter has the
effect of ridge amplification with the orientation field, and orientation field correctly
extracted can maximize the performance singularity extraction. Filters of the front
part usually concern in the orientation field, and filters of the rear part effect an
improvement for the result image of the Gabor filter. Finally, filter is obtained, which
Evolutionary Singularity Filter Bank Optimization 387
composes of Gaussian filter, median filter, two Gabor filters, and dilation. The input
image is smoothed by the Gaussian filter, and impulse noise spikes of the image are
removed by the median filter. The Gabor filter effects to correctly calculate the
orientation field with ridge amplification. The dilation operation is used to remove
small anomalies, such as single-pixel holes in objects and single-pixel-wide gaps of
the image. It also has effect of thinning the ridges of images through the wider valleys
of image. Fig. 6 shows example images obtained after applying filters.
In the experiments, the error rate of singularity extraction over all individual filters was
investigated, and we obtained 5 good filters that were median 33 rectangle mask
filter, closing 33 X mask operator, Gaussian 33 mask filter, Gabor filter, and
closing 33 diamond mask operator. The heuristic filter is composed of these 5 filters,
and the heuristic filter II has the reverse order of the previous one. Fig. 7 shows the
comparison of various filters. The error rate of original was 18.2%, whereas individual
Error
Filter Filter type
rate
Original NULL 18.2%
Individual Closing Diamond 33 16.6%
Gabor Gabor Filter 16.7%
Median Closing
Closing Gaussian Gabor
Heuristic Rectangle Diamond 18.8%
X 33 33 Filter
33 33
Closing Median
Heuristic Gabor Gaussian Closing
Diamond Rectangle 14.7%
II Filter 33 X 33
33 33
Proposed Gaussian Median Gabor Gabor Dilation
13.9%
method 33 X 33 Filter Filter 13
388 U.-K. Cho, J.-H. Hong, and S.-B. Cho
filters and the Gabor filter produced 16.6% and 16.7% error rates, respectively. The
heuristic II filter yielded 14.7% error rate, while the proposed method obtained an
error rate of 13.9%. The proposed method shows better performance in singularity
extraction than the filters designed heuristically (Table 4).
The fingerprint image in Fig. 8 has one core and one delta. Fig. 8(a) shows the
original image and singularity is not detected by the extraction algorithm. Individual
filter that is a closing operation eliminates single-pixel dark spots from the image and
smoothes it, but the extraction algorithm detects a spurious singularity because of
unclear orientation on upper right-hand side field (Fig. 8(b)). The image is not en-
hanced well with the Gabor filter because the orientation field is not good (Fig. 8(c)).
The extraction algorithm detects not all singularity but spurious singularity with the
heuristic filter because of many blank spaces (Fig. 8(d)). The extraction algorithm
with the proposed method detects no missing and spurious singularity (Fig. 8(f)).
5 Conclusions
In future work, we would like to apply the proposed method to the other fingerprint
image databases. By changing the fitness function of the genetic algorithm, the
method for better performance of identification and classification of fingerprints will
be also investigated. Since the proposed method does not need any expert knowledge,
we can also apply the method to various fields of image processing.
Acknowledgements
This work was supported by the Korea Science and Engineering Foundation (KOSEF)
through the Biometrics Engineering Research Center (BERC) at Yonsei University.
References
[1] N. Ratha, and R. Bolle, Automatic Fingerprint Recognition Systems, Springer-Verlag,
New York, 2004.
[2] A. K. Jain, L. Hong, and R. Bolle, On-line fingerprint verification, IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp. 302-314, 1997.
[3] A. M. Bazen, and S. H. Gerez, Systematic methods for the computation of the direc-
tional fields and singular points of fingerprints, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 24, no. 7, pp. 905-919, 2002.
[4] A. M. Bazen, and S. H. Gerez, Segmentation of fingerprint images, In Proc.
ProRISC2001 Workshop on Circuits, Systems and Signal Processing, Veldhoven, The
Netherlands, pp. 276-280, 2001.
[5] L. Hong, Y. Wan, and A. Jain, Fingerprint image enhancement: Algorithm and per-
formance evaluation, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 20, no. 8, pp. 777-789, 1998.
[6] J. Yang, L. Liu, T. Jiang, and Y. Fan, A modified Gabor filter design method for finger-
print image enhancement, Pattern Recognition Letters, vol. 24, pp. 1805-1817, 2003.
[7] S. Greenberg, M. Aldadjem, and D. Kogan, Fingerprint image enhancement using filter-
ing techniques, Real-Time Imaging, vol. 8, pp. 227-236, 2002.
[8] E. Zhu, J. Yin, and G. Zhang, Fingerprint enhancement using circular Gabor filter, In-
ternational Conference on Image Analysis and Recognition, pp. 750-758, 2004.
[9] D. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learning, Addi-
son Wesley, 1989.
[10] L. M. Schmitt, Theory of genetic algorithms, Theoretical Computer Science, vol. 259,
pp. 1-61, 2001.
[11] S. B. Cho, Emotional image and musical information retrieval with interactive genetic
algorithm, Proceedings of the IEEE, vol. 92, no.4, pp. 702-711, 2004.
[12] N. R. Harvey, and S. Marshall, The use of genetic algorithms in morphological filter de-
sign, Signal Processing: Image Communication, vol. 8, pp. 55-71, 1996.
[13] N. Yager, and A. Admin, Fingerprint classification: A review, Pattern Analysis and
Application, vol. 7, pp. 77-93, 2004.
[14] M. M. S. Chong, T. H. Ngee, L. Jun, and R. K. L. Gay, Geometric framework for fin-
gerprint image classification, Pattern Recognition, vol. 30, pp. 1475-1488, 1997.
[15] Q. Zhang, and H. Yan, Fingerprint classification based on extraction and analysis of
singularities and pseudo ridges, Pattern Recognition, vol. 37, pp. 2233-2243, 2004.
390 U.-K. Cho, J.-H. Hong, and S.-B. Cho
1 Introduction
Classication problems are probably among the most studied ones in the eld of
computer science since its beginnings [1]. Classication is a process according to
which an object is attributed to one of a nite set of classes or, in other words,
it is recognized as belonging to a set of equal or similar entities, identied by
a name (label in the classication eld jargon). In the last decades, Evolution-
ary Algorithms (EAs) have demonstrated their ability to solve hard non linear
problems characterized by very complex search spaces [2]; they have also been
used to solve classication problems.
Genetic Algorithms (GAs), for example, have been widely applied for evolving
sets of rules that predict the class of a given object. According the GAs-based
approach, referred to as learning classifier systems (LCS) [3], the individuals
in the population encode one or more prediction rules of the form IF-THEN,
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 391402, 2006.
c Springer-Verlag Berlin Heidelberg 2006
392 L.P. Cordella et al.
The proposed classication method has been tested on satellite images and the
results obtained have been compared with those obtained by other well known
classication techniques, included a standard LVQ. The data set at hand has
been divided in a training and a test set. The former set was used to train both
our system and the other classiers taken into account for the comparison. As
regards our system, once a set of prototypes has been obtained, the classication
of an unknown patterns of the test set is performed by assigning to it the label
of the nearest prototype in the feature space. The experiments performed by
using the data taken into account have shown very interesting results and have
conrmed the eectiveness of the proposed approach.
The remainder of the paper is organized as follows: in section 2 a formaliza-
tion of data classication is described; section 3 illustrates the standard LVQ
algorithm, while in section 4 the proposed approach is detailed. In section 5 the
experimental results are presented, while section 6 is devoted to the conclusions.
2 Data Classification
= {1 , . . . , ND } : i [1, c]
The ith element i of is said the label of the ith pattern Xi of D. We will
say that the patterns of D can be grouped into c dierent classes. Moreover,
given the pattern Xi and the label i = j, we will say that Xi belongs to the
jth class.
Given a data set D = {X1 , . . . , XND } containing c classes, a classier is
dened as a function
: D [0, c]
In other words, a classier assigns a label i [0, c] to each input pattern Xi .
If i = 0, the corresponding pattern Xi is said rejected. This fact means that the
classier is unable to trace the pattern back to any class.
The pattern Xi is recognized by if and only if:
i = i
begin
initialize the reference vectors 1 , . . . , k ;
initialize the learning rate a(0);
while stop condition is false do
for i = 0 to Ntr do
find j so that xi j = min;
end for
update the vector j as follows:
if i = Cj then
j (new) = j (old) + a(xi j (old))
end if
if i = Cj then
j (new) = j (old) a(xi j (old))
end if
end while
end
Fig. 1. The algorithm used in order to determine the position of the reference vectors
in the feature space. Ntr is the number of patterns in the training set, while i and Cj
respectively represent the labels of the pattern xi and of the winner vector j . This
algorithm is often referred in the literature to as the winner take all (WTA). Note
that usually the stop condition species a certain number of iterations (epochs in the
LVQ jargon).
training set of labeled patterns (see gure 1). Note that, usually, the number of
reference vectors for each of the classes have to be provided by the user.
4 Prototype Generation
As mentioned above, the prototypes to be used for classication are represented
by points in the feature space. The method proposed in this paper for nding
a good set of prototypes is based on a particular class of genetic algorithms [2],
namely the Breeder Genetic Algorithms (BGA) [14], in which the individuals are
encoded as real valued vectors. In our case, an individual consists of a variable
length list of feature vectors, each one representing a prototype (see Figure 2).
The system, accordingly to the EA paradigm, starts by generating a popu-
lation of P individuals. The number of prototypes (in the following referred as
length) of these initial individuals is randomly assigned in the range [Nmin , Nmax ].
Each prototype is initialized by randomly choosing a pattern in the training set.
Afterwards, the tness of the individuals generated is evaluated. A new pop-
ulation is generated in two ways: on one side, according to an elitist strategy,
the best E individuals are selected and just copied. On the other side, (P E)/2
couples of individuals are selected by using a selection mechanism. The crossover
operator is then applied to each of the selected couples, according to a chosen
probability factor pc . The mutation operator is then applied to the individuals
according to a probability factor pm . Finally the obtained individuals are added
to the new population. The process just described is repeated for Ng generations.
The tness function, the selection mechanism and the operators employed are
described in the following.
Parents
Ospring
2. Each valid prototype of an individual is labeled with the label most widely
represented in the corresponding cluster.
3. The recognition rate obtained on Dtr is computed and assigned as tness
value to the individual.
In order to favor the individuals able to obtain good performances with a lesser
number of prototypes, the tness of each individual is increased by 0.1/Np , where
Np is the number of prototypes in an individual. Moreover, the individuals having
a number of prototypes out of the interval [Nmin , Nmax ] are killed, i.e. marked
in such a way that they are not chosen by the selection mechanism.
The tournament method has been chosen as selection mechanism. In the tourna-
ment selection, a number T of individuals is chosen randomly from the popula-
tion and the best individual from this group is selected as parent. This process is
repeated for as many individuals have to be chosen. Such a mechanism ensures
to control the loss of diversity and the selection intensity [15].
In the approach presented here two genetic operators have been devised:
crossover and mutation. The crossover operator belongs to the wider class of
recombination operators: it accepts in input two individuals and it yields as out-
put two new individuals. This operator acts at list level and gives our system
the important feature of automatically discovering the number of prototypes
actually needed to represent the classes dened in the problem at hand. The
mutation operator, instead, manipulates a single individual. Its eect is that
of moving the prototypes (i.e. vectors) of the input individual in the feature
space. These operators are detailed in the following.
begin
for j = 0 to NF do
range = rndreal(0.1 * j )
if ip(pm ) then
x[j] = x[j] range (+ or - with equal probability);
end if
end for
end
Fig. 4. The mutation operator applied to each of the prototypes, i.e. reference vectors,
in an individual. NF is the number of features of the patterns in the data analyzed. j
is the range of the j-th feature computed on the training set, while pm represents the
probability of mutation of each single feature value in the prototype.
are respectively l1 and l2 , the crossover is applied in the following way: the rst
individual is split in two parts by randomly choosing an integer t1 in the interval
[1, l1 ]. The obtained lists of vectors I1 and I1 will have length t1 and l1 t1
respectively. Analogously, by randomly choosing an integer t2 in the interval
[1, l2 ], two lists of prototypes I2 and I2 , respectively of length t2 and l2 t2 ,
are obtained from I2 . At this stage, in order to obtain a new individual, the
lists I1 and I2 are merged. This operation yields a new individual of length
t1 + l2 t2 . The same operation is applied to the remaining lists I2 and I1 and
a new individual of length t2 + l1 t1 is obtained. The number of the swapped
prototypes depends on the integers t1 and t2 . An example of application of this
operator is given in gure 3. As mentioned above the implemented crossover
operator allows one to obtain ospring individuals whose length may be quite
dierent from that of the parents. As a consequence, during the evolution process,
individuals made of a variable number of prototypes can be evolved.
Mutation. Given an individual I, the mutation operator is independently ap-
plied to each prototype of I. In gure 4 the algorithm used to modify each of
the prototypes in an individual is shown.
5 Experimental Results
In order to ascertain its eectiveness, the proposed approach has been tested on
data extracted from two landsat 6 band multispectral images. The rst one1 is
2030x1167 pixels large and has been taken in order to distinguish between forest
non forest areas, while the second one2 is a 1000x1000 pixels large, related to
the land cover mapping for desertication studies. To this images a segmentation
method has been applied in order to obtain regions formed by the same type of
pixel [16]. For each of the region provided by the segmentation, a set of features
have been extracted, related to its geometrical characteristics and to its spectral
1
The image has been provided by courtesy from JRC.
2
This image has been provided by courtesy from ACS spa as part of the Desert Watch
project.
398 L.P. Cordella et al.
data. For each region the extracted features have been used to build up a data
record.
From the rst image a data set made up of 2500 items, i.e. regions, have been
derived; each item may belong to one of two classes, forest or nonforest. From
the second image 7600 items have been extracted and the items may belong
to 7 classes, representing various land cover types: various vegetation or water.
Although the segmentation process used extracts more than 10 features for each
of the regions identied, for both the images analyzed only six features have
been considered.
Preliminary trials have been performed to set the basic evolutionary param-
eters reported in Table 1. This set of parameters has been used for all the ex-
periments reported in the following. Since our approach is stochastic, as well
as all the ECbased algorithms, 20 runs have been performed for each data set
taken into account. The reported results are those obtained using the individual
having the highest tness among those obtained during the 20 performed runs.
The results obtained by our method on the data described above have been
compared with those obtained by other three classication algorithms: nearest
neighbor (NN), kNN and a standard LVQ. These algorithms are detailed in the
following:
LVQ. The LVQ used for the comparison of our results is an improved version of
that described in section 3, it is called Frequency Sensitive Competitive Learning
(FSCL) [17] and is often used to compare the performances of other algorithms.
Nearest Neighbor. Let Dtr be a training set of Ntr labeled patterns rep-
resented by feature vectors. A nearest neighbor (NN) classier recognizes an
unknown pattern x by computing the Euclidean distance between x and each of
the patterns in Dtr . Then x is recognized as belonging to the same class of the
nearest pattern in Dtr . It has been shown that a NN classier does not guarantee
the minimum possible error, i.e. the Bayes rate [1].
Table 2. Means and standard deviations of recognition rate on the test set for the
forest cover dataset
Table 3. The recognition rates on the test set obtained for the land cover dataset
Fig. 5. Recognition rate on training and test sets for the land cover data set during
the best run
system. However, in our opinion, the huge dierence in the number of necessary
prototypes used compensates this little dierence in the recognition rate.
In Table 3 the results obtained on the land cover data set are shown. In this
case the original data set has been randomly split in two sets, respectively as
training set and test set. For this data set the total number of executed runs
has been 20. In Table 3 the best results obtained and the number of prototypes
employed are reported. Our results are signicantly better than those obtained
form the NN classier and slightly better than that obtained from the kNN
(in this case k has been set equal to 6). Only the LVQ classier has obtained a
performance slightly better than ours, but such performance has been obtained
with a total number of 700 prototypes, while our performance has been obtained
using only 136 prototypes. Also in this case, the dierence in the number of
prototypes compensates the little dierence in the performance achieved.
In a learning process, in most cases, when the maximum performance is
achieved on training set, the generalization power, i.e. the ability of obtaining
similar performance on unknown data (the test set), may signicantly decrease.
In order to investigate such aspect for our system, the recognition rates on train-
ing and test set have been taken into account for the dierent considered data
sets. In Figure 5 such recognition rates, evaluated every 50 generations, in the
best run for the desert data set, are displayed. It can be observed from the
gure that, in the experiments carried out, the recognition rate increases with
the number of generations both for the training set and for the test set. The
best recognition rates occur in both cases nearby generation 400. Moreover, the
Evolutionary Generation of Prototypes for a LVQ Classier 401
fact that the dierence between the two recognition rates does not tend to in-
crease when that on the training set reaches its maximum, demonstrates the
good generalization power of our system.
A new method that uses the evolutionary computation paradigm has been de-
vised for generating prototypes for a LVQbased classier. The patterns be-
longing to the dierent classes, represented as vectors in a feature space, are
represented by prototypes obtained by evolving an initial population of ran-
domly selected feature vectors. The devised approach does not require any a
priori knowledge about the actual number of prototypes needed to represent the
classes dened in the problem at hand. The method has been tested on satellite
images and the results have been compared with those obtained by other clas-
sication methods. The obtained results and the comparisons performed have
conrmed the eectiveness of the approach and outlined the good generalization
power of the proposed method.
The results could be easily improved by applying the mutation operator in a
way that takes into account the performances obtained by the single prototypes.
In practice, the probability of application of the mutation operator to a single
prototype should be computed as a function of its performance. Specically, the
lower is the recognition rate obtained by the prototype, the lower should be
the probability of applying of the mutation to it. In this way, the research of
prototypes becomes more eective, since the probability of modifying good
prototypes is much lower than that of modifying bad prototypes, i.e. those
performing worse in recognizing patterns belonging to the same class.
Acknowledgments
The authors gratefully acknowledge the support of the Joint Research Centre
(JRC) that supplied the forest cover data and the ACS spa which supplied the
land mapping data as part of the Desert Watch project.
References
1. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classication. John Wiley & sons,
Inc. (2001)
2. Goldberg, D.E.: Genetic Algorithms in Search Optimization and Machine Learning.
Addison-Wesley (1989)
3. Lanzi, P.L., Stolzmann, W., Wilson, S.W., eds.: Learning Classier Systems: From
Foundations to Applications. Volume 1813 of Lecture Notes on Articial Intelli-
gence. Springer-Verlag, Berlin, Germany (2000)
4. Giordana, A., Neri, F.: Search-intensive concept induction. Evolutionary Compu-
tation 3 (1995) 375416
402 L.P. Cordella et al.
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 403414, 2006.
c Springer-Verlag Berlin Heidelberg 2006
404 I. De Falco, A.D. Cioppa, and E. Tarantino
the sense that the system is not given an a priori labelling of patterns, instead it
establishes the classes itself based on the statistical regularities of the patterns.
As far as we know, there exist no papers in literature in which DE is directly
employed as a tool for supervised classication of multiclass database instances.
Just one paper [8] analyzes the adaptation of DE and of other techniques in the
fuzzy modelling context for the classication problem, while in [9] DE is used
for proper weighting of similarity measures in the classication of high dimen-
sional and large scale data. Moreover, a few papers deal with DE applied to
unsupervised classication, for example on hard clustering problems [10] and
on images [11]. Therefore, in this paper we aim to evaluate DE eciency in
performing a centroidbased supervised classication by taking into account a
database composed by handsegmented parts of outdoor images, collected in a
sevenclass database. In the following, the term centroid means simply class
representative and not necessarily average point of a cluster in the multidi-
mensional space dened by the database dimensions. Our idea is to use DE to
nd the positions of the class centroids in the search space such that for any
class the average distance of instances belonging to that class from the relative
class centroid is minimized. Error percentage for classication on testing set is
computed on the resulting best individual. Moreover, the results are compared
against those achieved by ten wellknown classication techniques.
Paper structure is as follows: Section 2 describes DE basic scheme, while
Section 3 illustrates the application of our system based on DE and centroids
to the classication problem. Section 4 reports on the database faced, the re-
sults achieved by our tool and the comparison against ten typical classication
techniques. Finally Section 5 contains our conclusions and future works.
2 Dierential Evolution
3 DE Applied to Classication
3.1 Encoding
We have chosen to face the classication task by using a tool in which DE is
coupled with centroids mechanism (we shall hereinafter refer to it as DEC
system). Specically, given a database with C classes and N attributes, DEC
should nd the optimal positions of the C centroids in the N -dimensional space,
i.e. it should determine for any centroid its N coordinates, each of which can
take on, in general, real values. With these premises, the i-th individual of the
population is encoded as it follows:
(pi1 , . . . , piC ) (2)
where the position of the jth centroid is constituted by N real numbers repre-
senting its N coordinates in the problem space:
pij = {pj1,i , . . . , pjN,i} (3)
406 I. De Falco, A.D. Cioppa, and E. Tarantino
Algorithm 1. DE Algorithm
begin
randomly initialize population
evaluate tness of all individuals
while (maximal number of generations g is not reached) do
begin
for i = 1 to n do
begin
choose three integer numbers r1 , r2 and r3 in [1, n]
diering each other and dierent from i
choose an integer number k in [1, m]
for j = 1 to m do
begin
choose a random real number in [0.0, 1.0]
if
(( < CR) OR (j = k))
xi ,j = xr3 ,j + F (xr1 ,j xr2 ,j )
else
xi ,j = xi,j
end
if
(xi ,j ) < (xi,j )
insert xi ,j in the new population
else
insert xi,j in the new population
end
end
end
3.2 Fitness
Following the classical approach to supervised classication, also in our case a
database is divided into two sets, a training one and a testing one. The automatic
tool learns on the former, and its performance is evaluated on the latter.
Our tness function is computed as the sum on all the training set instances
of the euclidean distance in the N -dimensional space between the generic in-
stance xj and the centroid of the class CL it belongs to according to database
CL (x )
(pi known j ). This sum is divided by DTrain , which is the number of instances
composing the training set. In symbols, the tness of the ith individual is given
by:
D
1 Train
CL (x )
(i) = d xj , pi known j (4)
DTrain j=1
Automatic Classication of Handsegmented Image Parts with DE 407
Fig. 1. One of the images used to create the database (taken from [13])
19 attributes some represented by integer values and some by real ones. All the
attributes are listed in Table 1 together with their type and their minimal and
maximal values. There are no missing values.
Since DEC can handle real values while some attributes are of integer type,
care has been posed in dealing with this issue. Namely, attributes with inte-
ger values have been converted to real values, and the reverse conversion with
suitable rounding is carried out in the obtained solutions.
Some database features are displayed in Fig. 2. Its upper part shows the
relationship between attributes 2 and 11, and reveals that in this case the in-
stances are well grouped according to their class, so we might hypothesize that
the database might be easily divided into classes without making many errors.
Unfortunately this conclusion is not true, and a careful analysis of the database
says that in the majority of the cases the relationship between attributes is much
more dicult to deal with. As an example, the lower part of the Figure 2 plots
the attributes 1 and 6: in this case instances belonging to dierent classes are
really mixed, and any classication tool based on centroids might have serious
problems in correctly classifying instances.
the meta-techniques the Bagging [19], among the tree-based ones J48 [20] and
Naive Bayes Tree (NBTree) [21], among the rule-based ones PART [22] and
Ripple Down Rule (Ridor) [23] and among the other the Voting Feature Interval
(VFI) [24].
Parameter values used for any technique are those set as default in WEKA.
On the basis of a preliminary tuning phase carried out on this and other
databases, DEC parameters have been chosen as follows: n = 200, g = 500,
CR = 0.01 and F = 0.01. It is interesting to note that the values for CR and
F are much lower than the ones classically used according to literature, which
range higher than 0.5. A hypothesis about the reason for this may be that any
chromosome in the population consists in this case of N C = 19 7 = 133
components, so search space is very large and high values for CR and F would
change too many alleles and create individuals too dierent from the parents.
This would drive the search far from the promising regions already encountered,
thus creating worse individuals than those present in the current population. As
a consequence, given the elitist kind of replacement strategy, only very few new
individuals would be able to enter the next generation. The just hypothesized
scenario seems conrmed by the evolution shown by the system for high values
of CR and F : in this case best individual tness decrease is not continuous,
rather it takes place in steps, each of which lasts tens of generations. This kind
of evolution appears to be similar to that typical of a random search with elitism.
Results of DEC technique are averaged over 20 runs diering one another
for the dierent starting seed provided in input to the random number generator
410 I. De Falco, A.D. Cioppa, and E. Tarantino
only. For the other techniques, instead, some (MLP, Bagging, Ridor, PART and
J48) are based on a starting seed so that also for them 20 runs have been carried
out by varying this value. Other techniques (Bayes Net, KStar, VFI) do not
depend on any starting seed, so 20 runs have been executed as a function of a
parameter typical of the technique (alpha for Bayes Net, globalBlend for KStar
and bias for VFI). NBTree and IB1, nally, depend neither on an initial seed
nor on any parameter, so only one run has been performed for them.
DEC execution time is around 27 seconds per run on a personal computer
with a 1.6GHz Centrino processor. Thus times are comparable with those of
the other techniques, which range from 2 3 seconds up to about 1 minute for
the MLP.
Table 2 shows the results achieved by the 11 techniques on the database.
Namely, for any technique the average values of %err and the related standard
deviations are given. Of course is meaningless for NBTree and IB1.
Automatic Classication of Handsegmented Image Parts with DE 411
DEC BAYES MLP IB1 KSTAR BAG- J48 NB PART RIDOR VFI
NET ANN GING TREE
%err 7.46 12.85 7.57 10.71 7.42 10.35 10.71 11.42 10.71 9.64 21.21
0.97 1.09 1.05 1.51 2.17 0.00 0.00 2.35 0.33
As it can be observed from the values in Table 2 DEC is the second best
technique in terms of %err, very close to KStar which is the best, and closely
followed by MLP. All other techniques are quite far from these three. The stan-
dard deviation for DEC is not too high, meaning that, independently of the
dierent initial populations, the nal classications achieved have similar cor-
rectness. Some techniques like Bagging and Ridor, instead, show very dierent
nal values of %err, thus sensitivity to dierent initial conditions. On the con-
trary, other techniques like J48 and PART are able to achieve the same nal
value of %err in all the eected runs.
Thus, our idea of exploiting DE to nd positions of centroids has proven
eective to face the Image database.
From an evolutionary point of view, in Fig. 3 we report the behavior of a typ-
ical run (the execution shown is the run number 1 carried out on the database).
Its top part shows the evolution in terms of best individual tness and average
tness in the population as a function of the number of generations. DEC
shows a rst phase of about 125 generations in which tness decrease is strong
and almost linear, starting from 0.82 for the best and 0.92 for the average, and
reaching about 0.46 for the best and 0.58 for the average. A second phase follows,
lasting until about generation 350, in which decrease is slower, and the two values
tend to become closer, until they reach 0.23 and 0.26 respectively. From now on
the decrease in tness is slower and slower, about linear again but with a much
lower slope, and those two values become more and more similar. Finally, at
generation 500 the two values are 0.201 and 0.208 respectively.
The bottom part of Fig. 3, instead, reports the behavior of %err as a function
of the generations. Namely, for any generation its average value, the lowest error
value in the generation %errbe and the value of the individual with the best t-
ness value %errbf are reported. This gure shows that actually the percentage of
classication error on the testing set decreases as distancebased tness values
diminish, thus conrming the hypothesis underlying our approach. It should be
remarked here that %errbf does not, in general, coincide with %errbe , and is usu-
ally greater than this latter. This is due to the fact that our tness does not take
%err into account, so evolution is blind with respect to it, as it should be, and
it does not know which individual has the best performance on the testing set.
As described above for a specic run is actually true for all the runs carried
out, and is probably a consequence of the good parameter setting chosen. This
choice on the one hand allows a fast decrease in the rst part of the run and,
on the other hand, avoids the evolution being stuck in premature convergence
as long as generations go by, as it is evidenced by the fact that best and aver-
412 I. De Falco, A.D. Cioppa, and E. Tarantino
Fig. 3. Typical behavior of tness (top) and %err (bottom) as a function of the number
of generations
age tness values are dierent enough during the whole evolution. Preliminary
experiments not reported here showed, instead, that using high values for CR
and F leads to best and average tness values to rapidly become very close.
best individual. The experimental results have proven that the tool is successful
in tackling the task and is very competitive in terms of error percentage on
the testing set when compared with other ten classication tools widely used in
literature. In fact, only KStar has shown slightly better performance. Execution
times are of the same order of magnitude as those of the ten techniques used.
Results seem to imply that our method based on the simple concept of centroid
can be fruitfully exploited in the Image database. It may be hypothesized that
DE coupled with centroids might be suitably used in general to face classication
of instances in databases. This shall be one of our future issues of investigation.
Future works will aim to shed light on the eectiveness of our system in this
eld, and on its limitations as well. To this aim, we plan to endow DE with nich-
ing, aiming to investigate whether this helps in further improving performance.
References
1. Price K, Storn R (1997) Dierential evolution. Dr.Dobbs Journal 22(4):1824.
2. Storn R, Price K (1997) Dierential evolution - a simple and ecient heuristic
for global optimization over continuous spaces. Journal of Global Optimization
11(4):341-359. Kluwer Academic Publishers.
3. Back T (1996) Evolutionary algorithms in theory and practice: evolution strate-
gies, evolutionary programming, genetic algorithms. Oxford Univ. Press.
4. Eiben A E, Smith J E (2003) Introduction to evolutionary computing. Springer.
5. Fayyad U M, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (1996) Advances in
Knowledge Discovery and Data Mining. AAAI/MIT Press.
6. Duda R O, Hart P E, Stork D G (2001) Pattern Classication. Wiley-Interscience.
7. Weiss S M, Kulikowski C A (1991) Computer Systems that Learn, Classication
and Prediciton Methods from Statistics, Neural Networks, Machine Learning and
Expert Systems. Morgan Kaufmann, San Mateo, CA.
8. Gomez-Skarmeta A F, Valdes M, Jimenez F, Marin-Blazquez J G (2001) Ap-
proximative fuzzy rules approaches for classication with hybridGA techniques.
Information Sciences 136(1-4):193214.
9. Luukka P, Sampo J (2004) Weighted Similarity Classier Using Dierential Evo-
lution and Genetic Algorithm in Weight Optimization. Journal of Advanced Com-
putational Intelligence and Intelligent Informatics 8(6):591598.
10. Paterlini S, Krink T (2004) High Performance Clustering with Dierential Evolu-
tion. In: Proceedings of the Sixth Congress on Evolutionary Computation (CEC-
2004), vol. 2, pp. 2004-2011. IEEE Press, Piscataway NJ.
11. Omran M, Engelbrecht A P, Salman A (2005) Dierential Evolution Methods
for Unsupervised Image Classication. In: Proceedings of the IEEE Congress on
Evolutionary Computation (CEC-2005). IEEE Press, Piscataway NJ.
12. Blake C L, Merz C J (1998) UCI repository of machine learning databases, Univer-
sity of California, Irvine. http://www.ics.uci.edu/mlearn/MLRepository.html.
13. Piater J H, Riseman E M, Utgo P E (1999) Interactively training pixel classiers.
International Journal of Pattern Recognition and Articial Intelligence 13(2):171
194. World Scientic.
14. Witten I H, Frank E (2000) Data mining: practical machine learning tool and
technique with Java implementation. Morgan Kaufmann, San Francisco.
15. Jensen F (1996) An introduction to bayesian networks. UCL Press and Springer-
Verlag.
414 I. De Falco, A.D. Cioppa, and E. Tarantino
1 Introduction
Many optimization problems are in practice dicult to solve with standard nu-
merical methods (like gradient-based strategies), as they incorporate dierent
types of discrete parameters, and confront the algorithms with a complex geom-
etry (rugged surfaces, discontinuities). Moreover, the high dimensionality of these
problems makes it almost impossible to nd optimal settings through manual
experimentation. Therefore, robust automatic optimization strategies are needed
to tackle such problems.
In this paper we present an algorithm, namely the Mixed-Evolution Stra-
tegy (MI-ES), that can handle dicult mixed-integer parameter optimization
problems. In particular, we aim to optimize feature detectors in the eld of
Intravascular Ultrasound (IVUS) image analysis.
IVUS images show the inside of coronary or other arteries and are acquired
with an ultrasound catheter positioned inside the vessel. An example of an IVUS
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 415426, 2006.
c Springer-Verlag Berlin Heidelberg 2006
416 R. Li et al.
image with several detected features can be seen in Figure 1. IVUS images are
dicult to interpret which causes manual segmentation to be highly sensitive to
intra- and inter-observer variability[5]. In addition, manual segmentation of the
large number of IVUS images per patient is very time consuming. Therefore an
automatic system is needed. However, feature detectors consist of large numbers
of parameters that are hard to optimize manually and may dier for dierent
interpretation contexts. Moreover, these parameters are subject to change when
something changes in the image acquisition process.
Fig. 1. An IntraVascular UltraSound (IVUS) image with detected features. The black
circle in the middle is where the ultrasound imaging device (catheter) was located. The
dark area surrounding the catheter is called the lumen, which is the part of the artery
where the blood ows. Above the catheter calcified plague is detected which blocks the
ultrasound signal causing a dark shadow. Between the inside border of the vessel and
the lumen there is some soft plague, which does not block the ultrasound signal. The
dark area left of the catheter is a sidebranch.
lumen vessel
Agent Platform agent agent
(Soar) Interaction
Interaction
shadow sidebranch
agent calcified agent
plaque
Interaction agent
Perception Action
Image
Image Processing
Platform Input
Proc. Images
Results
Although the multi-agent system has shown to oer lumen and vessel detec-
tion comparable to human experts [1], it is designed for symbolic reasoning, not
numerical optimization. Further it is almost impossible for a human expert to
418 R. Li et al.
completely specify how an agent should adjust its feature detection parameters
in each and every possible interpretation context. As a result an agent has only
control knowledge for a limited number of contexts and a limited set of feature
detector parameters.
In addition, this knowledge has to be updated whenever something changes
in the image acquisition pipeline. Therefore, it would be much better if such
knowledge might be acquired by learning the optimal parameters for dierent
interpretation contexts automatically.
zi [zimin , zimax ] Z, i = 1, . . . , nz
di Di = {di,1 , . . . , di,|Di | }, i = 1, . . . , nd
1
In most cases a maximal number of generations is taken as termination criterion.
420 R. Li et al.
Algorithm 2 summarizes the mutation procedure. For the local and global
step-size learning
rates l and g we use
the recommended parameter settings
= 1 and l = 1/ 2 nr and g = 1/ 2nr (cf. [8]).
Mixed-Integer Evolution Strategies and Their Application 421
Fig. 4. Surface plots of the barrier function for two variables. All other variableswere
kept constant at a value of zero, two integer values were varied in the range from 0 to
20.
1000 1000
15+100 15+100
4+28 4+28
100 3+10 3+10
10
Objective function value
0.1
0.01
0.001 10
0.0001
1e-05
1e-06 1
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Number of evaluations Number of evaluations
Fig. 5. Averaged results on the sphere and the barrier functions. For all results the
median and the quartiles of 20 runs are displayed.
Table 2. Performance of the best found MI-ES parameter solutions when trained on
one of the ve datasets (parameter solution i was trained on dataset i). All parameter
solutions and the (default) expert parameters are applied to all datasets. Average
dierence (tness) and standard deviation w.r.t. expert drawn contours are given.
Acknowledgements
This research is supported by the Netherlands Organisation for Scientic Re-
search (NWO) and the Technology Foundation STW. We would like to thank
Andreas Fischbach and Anne Kunert for the experiments on the sphere and
barrier problems and the anonymous reviewers for their useful comments.
426 R. Li et al.
References
1. E.G.P. Bovenkamp, J. Dijkstra, J.G. Bosch, and J.H.C. Reiber. Multi-agent seg-
mentation of IVUS images. Pattern Recognition, 37(4):647663, April 2004.
2. M. Emmerich, M. Schutz, B. Gross and M. Grotzner: Mixed-Integer Evolution Strat-
egy for Chemical Plant Optimization. In I. C. Parmee, editor, Evolutionary Design
and Manufacture (ACDM 2000), 2000, pp. 55-67, Springer NY
3. M. Emmerich, M. Grotzner and M. Schutz: Design of Graph-based Evolutionary Al-
gorithms: A case study for Chemical Process Networks. Evolutionary Computation,
9(3), 2001, pp. 329 - 354, MIT Press
4. F. Homeister and J. Sprave. Problem independent handling of constraints by use of
metric penalty functions. In L. J. Fogel, P. J. Angeline, and Th. Back, editors, Evo-
lutionary Programming V - Proc. Fifth Annual Conf. Evolutionary Programming
(EP96), pages 289294. The MIT Press, 1996.
5. G. Koning, J. Dijkstra, C. von Birgelen, J.C. Tuinenburg, J. Brunette, J-C. Tardif,
P.W. Oemrawsingh, C. Sieling, and S. Melsa. Advanced contour detection for three-
dimensional intracoronary ultrasound: validation in vitro and in vivo. The Inter-
national Journal of Cardiovascular Imaging, (18):235248, 2002.
6. A. Newell. Unified Theories of Cognition. Number ISBN 0-674-92101-1. Harvard
University Press, Cambridge, Massachusetts, 1990.
7. G. Rudolph, An Evolutionary Algorithm for Integer Programming, In Davidor et
al.: Proc. PPSN III, Jerusalem, Springer, Berlin, 1994, 139148.
8. H.-P. Schwefel: Evolution and Optimum Seeking, Sixth Generation Computing Se-
ries, John Wiley, NY, 1995
The Honeybee Search Algorithm for
Three-Dimensional Reconstruction
Proyecto Evovision,
Departamento de Ciencias de la Computacion, Division de Fsica Aplicada,
Centro de Investigacion Cientca y de Estudios Superiores de Ensenada,
Km. 107 carretera Tijuana-Ensenada, Ensenada, 22860, B.C., Mexico
{olague, puente}@cicese.mx
http://cienciascomp.cicese.mx/Pagina-Olague.htm
1 Introduction
Three-dimensional reconstruction has always been a fundamental research topic
in computer vision and photogrammetry. Today the importance of image and
vision computing task has gained relevance in the evolutionary computing com-
munity. This paper proposes a bioinspired approach to tackle the problem of
sparse and quasi-dense reconstruction using as model the honeybee search be-
havior. This work is also inspired by the work of Louchet [12, 1, 13] in which an
individual evolution strategy was applied to obtain a three-dimensional model
of the scene using stereo-vision techniques. The main characteristic of that work
was the application of the Parisian approach to the evolution of a population
of 3D points, called ies, in order to concentrate those points on the object
surface of the scene. For more about the Parisian approach we recommend [7]
and [2]. One of the drawbacks of the approach of Louchet was the lack of a
paradigm to provide those 3D points with intelligent capabilities. Indeed, a high
number of outliers were produced with their technique. We decide to explore the
honeybee search behavior in order to develop an intelligent algorithmic process.
Honeybees are considered to perform one of the most complex communication
tasks, in the animal world. Indeed, concepts of memory attention, recognition,
understanding, interpretation, agreement, decision-making, and knowledge, as
well as questions about cognition and awareness, have appeared regularly in the
honeybee literature. In this way, the honeybees are considered to achieve mental
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 427437, 2006.
c Springer-Verlag Berlin Heidelberg 2006
428 G. Olague and C. Puente
Currently, most scientists in the honeybee behavioral community agree that the
communication system of the bees is a language regarding insect capacities [3].
The honeybee dance language has been used by researchers to create machine
vision systems [19, 20], as well as for robotics tasks [11]. All these works attempt
to provide knowledge based on the study of the honeybee. However, none of these
works have used the adaptive behavior of the honeybee swarm. In this way, our
work is also related to the ant colony optimization meta-heuristic and is more
general eld called swarm intelligence [5, 6]. However, our work is also strongly
related to evolutionary computing as we will explain later. This work is part of
our own eort to build new algorithms based on some basic principles taken by
the observation of a particular natural phenomenon [16, 18]. Honeybees use a so-
phisticated communication system that enables them to share information about
the location and nature of resources. If a sugar solution is placed outdoors a long
time might elapse before they found the food. Soon after this rst visit, however,
bees soon began swarming around the feeder. The communication among bees is
performed using what is called the dance language as a means of recruitment.
The dance language refers to patterned repetitive movements performed by bees
that serve to communicate to their nestmates the location of food sources or nest
sites. In this way, the dance is a code that conveys the direction, distance, and
desirability of the ower patch, or other resource, discovered. The waggle dance
of honeybees can be thought of as a miniaturized reenactment of the ight from
the hive to the food or resource. Some honeybee scientists have correlated the
distance to the site with the speed of the dance. As the ight to the food dis-
tance becomes longer, the duration of the waggle portion of the dance becomes
longer. However, the detailed nature of distance communication has been di-
cult to determine, because the rate of circling and the length of the waggle run
correlate with distance information. Moreover, a question arise: if is the nding
that it is not distance per se the bees indicate, but rather the eort needed to
arrive at the dance location. What is really important is that honeybees use the
dances symbolically encoded information to locate resources. Thus, honeybees
use both dancing and odors to identify the location of resources, as well as the
desirability of a resource. The desirability is expressed in the dances liveliness
and enthusiasm: the richer the source, the livelier the dance that can last many
minutes, even hours. The dances are deployed to meet various colony needs such
as: changed to monitor shifting environmental conditions, responsive to commu-
nication with hivemates, and switched on the basis of superior information from
other dancers. Hence, these features suggest that the dance is a tool used by the
bees, rather than a behavioral pattern rigidly emitted. When a honeybee discov-
The Honeybee Search Algorithm for Three-Dimensional Reconstruction 429
ers a rich patch, she returns and seeks out her hivemates in a specic location
near the hive entrance called the dance oor. She performs the dance on the
vertical comb in the dark hive surrounded by numerous potential recruits. The
dancer pauses for antennal contact with her followers, and to transfer some of
the nectar she has harvest to them. The communicative nature of the dance is
apparent in that dances are never performed without audience. While the dance
is mostly used to indicate the location of owers, it is also used for pollen, water
when the hive is overheating, waxy materials when the comb needs repair, and
the new living quarters when part of the colony must relocate. The angle that a
bee ies during the ight to the resource, relative to the sun azimuth (the hori-
zontal component of the direction toward the sun), is mirrored in the angle on
the comb at which the waggle portion of the dance is performed. If the resource
is to be found directly toward the sun, a bee will dance straight upward. If the
resource is directly away from the sun, the bee will dance straight downward.
If the resource is at 45 to the right of the sun, then the dance is performed
with the waggle run at 45 to the right of the vertical, and so forth. Honeybees
make a transition from round dances for food near the nest to waggle dances
at a greater distance. In fact the bees perform the round dance as the waggle
dance being performed on the same spot rst in one direction and then in the
other. The bees trace out a gure-of-eight with its two loops more or less closely
superimposed upon one another. In this way, the waggle dance is represented at
its minimal measure of a single point.
Fig. 1. The honey bee search process is composed of three main activities: exploration,
recruitment and harvest
430 G. Olague and C. Puente
In this section we give details about the algorithm that we are proposing to ob-
tain information about the three-dimensional world. Normally, the reconstruc-
tion of the three-dimensional world is achieved using calibrated and uncalibrated
approaches in which several geometric relationships between the scene and the
images are computed from point correspondences. The projection matrix mod-
els the transformation from the scene to the image, and this could be thought
as a direct approach. On the other hand, the transformation from the images
to the scene is realized by a process known as triangulation and this could be
imagined as an inverse approach. Obviously, to triangulate a 3D point it is nec-
essary to use two 2D points obtained from two images separated at least by a
translation. We would like to stay that errors on the calculation could produce
misleading results. Therefore it is necessary to apply the best possible algorithm
in the calculation of the projection matrix. The problem in this work is posed
as a search process in which the 3D points are searched using the direct ap-
proach. In this way, it is avoid the use of the epipolar geometry computation.
This idea represents a straightforward approach in which a 3D point with coor-
dinates (X, Y, Z) on the Euclidean world is projected into two 2D points with
coordinates (xl , yl ) for the left camera coordinate system and (xr , yr ) for the
right camera coordinate system. A measure of similarity is computed with the
Zero Normalized Cross-Correlation (ZNCC) and the image gradient to decide if
both image points represent the same 3D point. We apply an evolutionary al-
gorithm similar to evolution strategies ( + ) in which mutation and crossover
are applied as the main search operators.
In this work, we follow the approach proposed by Boumaza in which the new
population is created independently by the addition of three dierent process,
see Figure 3. This process is used by the exploration and harvest stages in the
honeybee search algorithm, see Figure 2. The exploration stage starts creating
The Honeybee Search Algorithm for Three-Dimensional Reconstruction 431
Now, we can proceed to reduce the search space with the following relationship:
f = 0.5 (1 u) + 1 u ,
(1)
i = i f .
432 G. Olague and C. Puente
Where u represents the degree of desirability that a place holds according to the
distance di /dmax . The value of f lies in the interval [0.5, 1], where 0.5 is related
to the highest distance, while 1 is related to the closest 3D point.
The next stage is to harvest the source patch for each explorer using a similar
algorithm with two cycles. The rst cycle is dedicated to visit each place that
was selected by the explorer. In this way, the foragers that have been selected by
the explorer starts a new search process around the point where the explorer is
located in order to exploit this location. Hence, the exploration and exploitation
steps are achieved by the explorers and foragers respectively. As we can observe
each group of foragers exploits sequentially all places. Note that the number of
foragers that have been assigned to each explorer is variable according to the
tness function. It is possible that not all explorers have assigned foragers to
harvest their place location. In order to know how many foragers are assigned
to each explorer, we calculate the proportion of foragers being assigned to the
explorers using the proportional tness
N
pi = f itnessi / f itnessj .
j=1
Thus, the number of foragers assigned to each explorer is computed using the
following factor
ri = pi , (2)
The Honeybee Search Algorithm for Three-Dimensional Reconstruction 433
where is the total size of the population. The second cycle is similar to the
exploration stage. Here, the tness function computation uses besides the ZNCC
the homogeneity of the texture without gradient computation . The homogeneity
is computed using the Gray Level Coocurrence Matrix because it has been proved
reliable in image classication and segmentation for content based image retrieval
[10]. Also, the size of the search space is obviously smaller with respect to the
exploration stage where it is considered the whole space. However, the number of
bees could be even bigger with respect to the exploration stage because the total
number of foragers is much bigger than the total number of explorers. Here, we
use 200 explorers and 2000 foragers. Next we explain the main search operators.
two dierent solutions. Accordingly, the generation of new solutions within the
evolutionary algorithm can be stated as follows:
where d(i,j) is the distance between the individuals i and j. share is the threshold
that controls the ratio of sharing. The above
Nfunction is applied to each individual
to obtain a niche count as follows: ni = j=1 Sh(di,j ). Then the shared tness
function is calculated with the following expression f itnessi = f itness
ni
i
.
Fig. 4. These gures shows the results of applying the honeybee search algorithm.
The rst two images shows the rst stereo pair with the projection of the articial
honeybees, while the second row shows the VRML to appreciate the spatial coherence.
The third and fourth rows show the results with a real person.
436 G. Olague and C. Puente
the document, see Equations 1 and 2. The sharing uses the following parameter
share = 100mm. Note that the objects on both images are placed more or less at
the same distance to the stereo rig. The parameters of the evolutionary operators
of mutation and crossover are as follows: mutation m = 25 and crossover x = 2.
Note that the last two parameters describe how the evolutionary operations are
applied, while the rates of mutation and crossover species how many individuals
are generated with those operations.
The advantage of using the honeybee search algorithm is the robustness
against outliers. We can appreciate in the VRML images of Figure 4 that all 3D
points are grouped coherently with the goal of reconstructing compact patches.
This is due to the intelligent process described in this paper in which some ar-
ticial honeybees (explorers) guide the search process to obtain an improved
sparse reconstruction. The explorers guide the foragers using texture and cor-
relation information during the whole process. Similar to the natural process
the goal is achieved using a communication system that we have adapted to the
classical evolutionary algorithm. It is suitable to think that the honeybee search
algorithm could be applied in other contexts.
Acknowledgments
References
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 438449, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Improving the Segmentation Stage 439
outdoor motion estimation is shadows [4] - [6], which attached to the moving
people make the system deform the real target. Moreover, shadows might [7] in-
terferer in subsequent stages of the tracking system so that it would be desirable
to remove them in a previous phase. The purpose of this study is to improve
the performance of this segmentation stage by adding a new block so that the
system detects more compact people-shaped blobs and eliminates if possible the
shadows so that the total performance of the whole system increases. The pa-
rameters that regulate this new block will be searched and tuned by an Evolution
Strategy (ES), which has proved to be valid for this type of problems in works
like [1]. The main goal of this work is to present based upon the idea that most
morphological image analysis tasks, can be reframed as image ltering problems
and that ES can be used to discover optimal lters which solve such problems.
This paper also introduces the tness function that assesses the performance of
the segmentation block [8, 9]. Finally, we must test the improvement on the com-
plete system for which we need an evaluation function. Although there is a large
literature and previous works on metrics for performance evaluation [1], [10]-[12],
this paper shows an original proposal based on a minimal ground truth record
and it is able to evaluate a large number of video samples to obtain signicants
statistical results.
This paper attempts to address these points. First, section 2 presents a study
of the segmentation stage in our video surveillance system. Section 3 details
the main problems that we face on tracking people and the solutions adopted.
Then, the evaluation function to assess the global performance of the system is
presented in Section 4. The details of the experiments and the nal conclusions
are given in Sections 6.
2 Segmentation Stage
Fig. 1. A generic video processing framework for automated visual surveillance system
of almost every vision system since it provides a focus of attention and simplies
the processing on subsequent analysis steps. As we have said above, due to
dynamic changes in natural scenes such as sudden illumination, shadows, weather
changes, motion detection is a dicult task to process reliably. Frequently used
techniques for moving object detection are background subtraction, statistical
methods, temporal dierencing and optical ow. Most of the moving object
detection techniques are pixel based [16, 17]. Background subtraction techniques
attempt to detect moving regions by subtracting the current image pixel-by-pixel
from a reference background image that is created by averaging images over time
in an initialization period. The pixels whose dierence exceeds a threshold are
classied as foreground. Although background subtraction techniques perform
well at extracting most of the relevant pixels of moving regions, they are usually
sensitive to dynamic changes when, for instance, repetitive motions (tree leaves
moving in windy day, see Figure 2), or sudden illumination changes occur.
Fig. 2. Dierent segmentation results obtained in dierent condition. The rst row
shows the excellent segmentation results in a calm day. However in the second row,
due to the tree leaves in a windy day, we observe brightness changes almost everywhere
in the image. Thus, the segmentation stage obtains worse performances.
Improving the Segmentation Stage 441
Fig. 3. Instances of the segmentation stage. Although the results are good enough
(third column), notice that in the third row, the object detected is rather dicult to
track.
as possible, e.g. by removing the special types of noise, but rather to make the
pedestrians segmentation more visible and easier to process in the classication
stage (see Figure 1).
Fig. 4. A pedestrian zoom views of Figure 3. It is clear that we can improve the
segmentation stage. In the images appear unconnected and missing areas.
on a comparison between the corresponding pixel in the input image and its
neighbors. By choosing the size and shape of the neighborhood, a morphological
operation can be tuned to be sensitive to specic shapes in the input image. In
our case, an erosion operator has been chosen as rst post-processing step in
order to remove the noise. Then, we apply a delation operator to improve the
size and shape of the pedestrian.
Now, our problem is concerned with the selection of the size of the suit-
able structuring element and the number of iterations of the erode and di-
late operations. We dene the rectangular size of structuring elements and
the number of iteration of erosion and dilate process by the next parameters:
HORIZONTAL-SIZE-ERODE, VERTICAL-SIZE-ERODE, HORIZONTAL-
SIZE-DILATE, VERTICAL-SIZE-DILATE, ITERATIONS-NUMBER-ERODE
and ITERATIONS-NUMBER-DILATE. Besides, we have to establish another
parameter involving in the segmentation stage: the THRESHOLD in Equation 1.
The election of the values of these parameters makes a big dierence in the per-
formance of the system. Thus, in the next section, we show how to use Evolution
Strategies in order to optimize these parameters.
been working with videos where there is only a pedestrian, and therefore we
expect to found a short number of blobs in our ideal segmentation stage.
be the image before and after the morphological post-processing,
Let Im and Im
respectively. We dene Im(x, y) and Im(x, y) as true, if and only if the pixel (x, y)
belongs to a moving object, respectively. We dene the Density ratio, D(B), of a
blob, B, as:
1
D(B) = Card{Im(x, y) Im(x, y)}; (x, y) B. (3)
n
where n is the number of pixels in the blob B and card stands for the cardinality
(i.e. number of pixels) of a set. The operator (and ) is applied to assess which
part of the processed image contains detected pixels in the original image.
Let AR(B) the Aspect Ratio of a blob, B. A blob is represented by its bound-
ing rectangle. We dene AR(B) as:
width(B)
AR(B) = (4)
height(B)
where width and height stands for the bounding rectangles width and height
of a blob, respectively. Since, in our system, pedestrians are the object that we
have to track, in contrast of shadows or noise, we expect to get a small value for
the AR(B) ratio in every blob.
At last, the tness function that we have to minimize is:
f itness = N B + AR(B) D(B) (5)
BIm BIm
4 Evaluation System
The main requirement for surveillance systems is the capability for tracking
objects of interests in operational conditions, with satisfactory levels of accuracy
and robustness. The dicult task is the denition of an automatic, objective
and detailed procedure able to capture the global quality of a given system in
order to support design decisions based on performance assessment. There are
many studies that evaluate video surveillance systems against the ground truth
or with synthetic images. Our contribution is a new methodology to compute
detailed evaluations based on a minimal amount of hand-made reference data
and a large quantity of samples. The result is a robust assessment based on a
statistical analysis of a signicant number of video sequences. Thus, our work
used the proposed evaluation system to assess the surveillance system and check
the increase of the total performance.
we want to track. In this case, we have recorded a set of video sequences of people
walking along a footpath. Our set of samples was divided into two groups: (1)
50 video sequences of people moving from right to left along a footpath, and (2)
50 video sequences of people moving from left to right along a footpath.
Thus, the subsequent assessment was separated into two steps and we ob-
tained two sets of results for each one of the video sequences.
The function f(x,y) that approximates the objects trajectory was very simple
in both cases. It was a straight line that was considered the ground truth for the
system.
Fig. 5. Video shot samples from the two sets of sequences and the function f(x,y) that
approximates the trajectory of each pedestrian
Fig. 6. Segmentation results before (from (a) to (c)) and after (from (d) to (f)) the
morphological post-processing
446 O. Perez et al.
Transversal Error. It is dened as the distance between the center of the track
and the segment which is considered as ground truth in this moment.
Continuity Faults. This metric checks if a current track existed the previous
moment or did not. If the track did not exist, it means that this track was lost
by the tracking system and recovered in a subsequent frame. This behavior
must be computed as continuity fault. This continuity metric is a counter
where one unit is added each time a continuity fault occurs.
Changes of Direction. This metric marks when a track changes its direction.
This metric is also a counter where one unit is added each time a change of
direction occurs.
Our general procedure for processing the video sequences was as follows:
1. Take a set of 5 random videos from the two video sequences groups.
2. Use the evolution strategies for adjusting the parameters of the morpholog-
ical operators added to the segmentation stage. We implemented ES with a
size of 10+10 individuals. This population is the minimum that assures the
same result as if we had taken a higher number of individuals. The mutation
factor of = 0.5 and the initial seed was xed at 100.
3. Repeat the experiment with at least three dierent seeds.
4. If the results are similar, x the parameters of the morphological algorithms
for using them in all videos.
5. Take one video sequences set and the parameters obtained by the evolu-
tion strategy. Make the surveillance system work and collect all the peoples
tracks for each frame of each video sequence.
6. Evaluate these tracks and compare the results (with and without morpho-
logical algorithms in the segmentation stage).
7. Repeat the process for the second set of videos sequences from step 5.
Table 1. Optimization results. Notice that the structuring element shape rewards high
and thin objects according to the pedestrians shape.
HORIZONTAL-SIZE-ERODE 1
VERTICAL-SIZE-ERODE 4
HORIZONTAL-SIZE-DILATE 1
VERTICAL-SIZE-DILATE 4
ITERATIONS-NUMBER-ERODE 2
ITERATIONS-NUMBER-DILATE 2
THRESHOLD 15
Improving the Segmentation Stage 447
Fig. 7. Metrics for people walking from right to left before and after the morphological
process (left and right column respectively)
448 O. Perez et al.
shadows and noises were removed so that subsequent stages of the surveillance
system created more appropriate tracks according to the parts of interests. That
is, the results had a real correspondence between the people we were interested
in and the resulted tracks of the tracking system. This aected directly the track
size, which was smaller as a consequence of the shadow elimination. This eect
is displayed in the Table 2.
Finally, the eect on the whole surveillance system is showed in Figure 7. In
order to have a more detailed idea of the system performance, the area under
study is divided into 10 zones. Each zone is dened as a xed number of pixels
of the x-axis, the 10% of the horizontal size of the image. The absolute area and
the transversal error show the mean, variance and maximum values for each of
these two metrics.
All the metrics presented a remarkable improvement on the behavior of the
total surveillance system. The absolute area decreased its mean value from 7033
to 3057.7 (see Table 2 and Figure 7(a) and 7(b)) due to the better adjustment
of the tracks to the pedestrian shape. Second, the transversal error improved
from a mean value of 10.5 to 7.8, which means that the gravity center of the
peoples track is closer to the ground truth function f (x, y). Moreover, the last
gures show that the number of losses for the tracks and the changes of direction
decreased by a factor of 2.
As a nal conclusion, we are able to conrm that the improvement in the
segmentation stage provides more compact and accurate blobs to the subsequent
blocks of the video surveillance system so that the performance of the surveillance
system does increased.
References
1. Perez, O., Garca, J., Berlanga, A., and Molina, J.M.: Evolving Parameters of
Surveillance Video System for Non-Overtted Learning. Proc. 7th European Work-
shop on Evolutionary Computation in Image Analysis and Signal Processing.
EvoIASP 2005. Lausanne, Switzerland (2005).
2. Garca, J. A. Besada, J. M. Molina, J. Portillo. Fuzzy data association for image-
based tracking in dense scenarios, IEEE International Conference on Fuzzy Sys-
tems. Honolulu, Hawaii (2002)
3. Friedman, N. and Russell, S.: Image segmentation in video sequences: A probabilis-
tic approach, in Proceedings of the Thirteenth Annual Conference on Uncertainty
in Articial Intelligence (UAI 97), Morgan Kaufmann Publishers, Inc., (San Fran-
cisco, CA) (1997) 175181.
4. Rosin, P.L., and Ellis, T.: Image Dierence Threshold Strategies and Shadow De-
tection, in the 6th BMVC 1995 conf. proc., Birmingham, UK, (1995) 347356.
Improving the Segmentation Stage 449
5. Jiang, C.: Shadow identication, CVGIP: Image Understanding, 59(2) (1994) 213
225.
6. Prati, A., Mikic, I., Trivedi, M.M., Cucchiara, R.: Detecting Moving Shadows:
Algorithms and Evaluation, IEEE Trans. PAMI 25(7) (2003) 918923.
7. Bevilacqua, A.: Eective Shadow Detection in Trac Monitoring Applications.
WSCG 2003, 11(1)
8. Zhang, Y.J.: Evaluation and comparison of dierent segmentation algorithms, Pat-
tern Recognition Letters 18 (1997) 963974.
9. Chabrier, S., Emile, B., Laurent, H., Rosenberger, C.,March, P.: Unsupervised
Evaluation of Image Segmentation Application to Multi-spectral Images. 17th In-
ternational Conference on Pattern Recognition (ICPR04) 1 (2004) 576579.
10. Erdem, E., Sankur, B., Tekalp, A.M.: Metrics for performance evaluation of video
object segmentation and tracking without ground-truth. ICIP 2 (2001) 6972.
11. Pokrajac, D. and Latecki, L.J.: Spatiotemporal Blocks-Based Moving Objects Iden-
tication and Tracking. IEEE Int. W. Visual Surveillance and Performance Eval-
uation of Tracking and Surveillance (VS-PETS). Nice, France. (2003).
12. Black, J., Ellis, T., and Rosin, P.: A Novel Method for Video Tracking Performance
Evaluation, Joint IEEE Int. Workshop on Visual Surveillance and Performance
Evaluation of Tracking and Surveillance (VS-PETS). Nice, France. (2003).
13. Aggarwal, J.K. and Cai, Q.: Human motion analysis: A review. Computer Vision
and Image Understanding, 73(3) (1999) 428-440.
14. Wang, L., Hu, W., and Tan., T.: Recent developments in human motion analysis.
Pattern Recognition, 36(3) (2003) 585-601.
15. Haritaoglu, D.I., Harwood, D., and Davis, L.: W4: Real- Time Surveillance of Peo-
ple and Their Activities, IEEE Trans. Pattern Analysis and Machine Intelligence
22(8) (2000) 809-830.
16. Remagnino, P., Jones, G.A., Paragios, N., and Regazzoni, C.S.: Video-Based
Surveillance Systems. Kluwer Academic Publishers, 2002.
17. Wren, C., Azarbayejani, A., Darrell, T., and Pentland, A.P.: Pnder: Real-time
Tracking of the Human Body, IEEE Trans. Pattern Analysis and Machine Intelli-
gence (PAMI) 19(7) (1997) 780-785.
18. Stauer, C., and Grimson, W.: Adaptive background mixture models for realtime
tracking. In Proc. of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, (1999) 246252.
19. Lipton, A.J., Fujiyoshi, H., and Patil, R.S.: Moving target classication and track-
ing from real-time video. In Proc. of Workshop Applications of Computer Vision,
(1998) 129-136.
20. Cohen, I., and Medioni, G.: Detecting and Tracking Moving Objects in Video from
an Airborne Observer, Proc. IEEE Image Understanding Workshop, (1998) 217
222.
21. Kim, E.Y., Park, S.H., Hwang, S., and Kim, H.J.: Video Sequence Segmentation
Using Genetic Algorithms. Pattern Recognition Letter, 23(7) (2002) 843863.
22. Hwang, S., Kim, E.Y., Park, S.H., and Kim, H.J.: Object Extraction and Tracking
Using Genetic Algorithms, in Proc. IEEE Signal Processing Society ICIP 2 (2001)
383386.
An Adaptive Stochastic Collision Detection Between
Deformable Objects Using Particle Swarm Optimization
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 450 459, 2006.
Springer-Verlag Berlin Heidelberg 2006
An Adaptive Stochastic Collision Detection Between Deformable Objects Using PSO 451
to dynamic changes and has great capability in solving the problem described above
for stochastic collision detection.
Most of collision detection algorithms are heavily based on data structures that can
be more or less pre-computed and update during every simulation steps. In our
algorithm, we make no assumption about the input model, which can without topology
or with changing topology and even be polygon soups. Besides, the algorithm neednt
to store additional data structures such as bounding volume hierarchies, so the memory
cost is low. It doesn't make any assumptions about object motion or temporal coherence
between successive frames either. The model primitives may undergo any motions,
vertices or edges can be inserted or deleted.
The remainder of the paper is organized as follows. We survey some related work
on collision detection for deformable objects in Sec.2. Sec. 3 shows the unified PSO
algorithm. Sec. 4 gives an overview of our approach and we give more detail of our
algorithm to handles deformable models in Sec.5. In Sec.6, we give precision and
efficiency evaluation about the algorithm and discuss some of its limitations.
2 Related Work
Numerous collision detection methods have been extensively studied, such as Spatial
partitioning method, Bounding Volume Hierarchies, Image-Space techniques, GPU-
based techniques, Distance Fields, or combination of them. Usually these mentioned
methods have been demonstrated to work efficiently in different kinds of
environments for rigid body simulations. But when we consider the problem of
deforming bodies, they are not that useful, as they rely heavily on pre-computed data
and data structures or they are dependent on certain body characteristics, for example,
bodies that must be decomposed into convex pieces. A very general collision
detection method for deformable objects has been proposed by Smith [7]. At every
time step, the AABB of all objects is calculated. When two overlapping AABBs are
found, object faces are first pruned against their overlap region. Remaining faces from
all such overlap regions are used to build a world face octree, which is traversed to
find faces located in the same voxels. A data structure called the BucketTree has also
been proposed [8], which is an octree data structure with buckets as leaves where
geometrical primitives can be placed. Another approach is suggested by van den
Bergen [9], which is also used in the collision detection library called SOLID[10].
Initially, AABB trees are built for every model in its own local coordinate system.
The AABBs in the trees are then transformed as the models are moved or rotated in
the scene. This transformation causes the models locally defined AABBs to become
OBBs in world space. When a model is deformed an update of the affected nodes in
the trees has to be done. In the literature, there are also some other algorithms for
flexible objects. For a more detailed review, please refer to the survey paper by [1].
Recently, inexact methods have become a focus in collision detection research.
Stochastic methods selects random pairs of colliding features as an initial guess of the
potential intersecting regions, and by this way it trades-off accuracy for computation
time. When the object moves or deforms, the method considers temporal coherence in
the sense that if a pair of features is close enough at a time step, it may still be
interesting in the next one [1]. [11]uses probabilistic principles to estimate the
452 T. Wang et al.
PSO is a new optimization technique originating from artificial life and evolutionary
computation. Different from most commonly used population-based evolutionary
algorithms, such as genetic algorithms, evolutionary programming, evolutionary
strategies and genetic programming, PSO algorithm is motivated from the simulation
of social behavior. The technique involves simulating social behavior among
individuals (particles) flying through a multidimensional search space and
evaluating particles positions relative to a goal (fitness) at every iteration. Particles in
a local neighborhood share memories of their best positions, and use these
memories to adjust their own velocities, and thus subsequent positions [4].
The original formulae developed by Kennedy and Eberhart[3] was improved by
Shi and Eberhart [4]. Particle i is represented as Xi=(xi1,xi2,...,xiD) in D-dimensions.
Each particle maintains a memory of its previous best position Pi=(pi1,pi2,...,piD) and a
velocity along each dimension Vi=(vi1,vi2,...,viD). In each iteration step, the P vector of
the particle with the best fitness in the local neighborhood, designated g, and the P
vector of the current particle are combined to adjust the velocity along each
dimension, and that velocity is then used to compute a new position for the particle.
An inertia parameter , that is used to increases the overall performance of PSO [5].
These formulae are:
v id = * v id + 1 * rand 1 () * ( p id x id ) + 2 *rand 2 () * (p gd x id ) (1)
x id = x id + v id (2)
where constants 1 and 2 determine the relative influence of the social and cognitive
components, and are usually both set the same and give each component equal weight
as the cognitive and social learning rate [4].Rand1() and rand2()are two random
functions in the range [0,1].
PSO algorithm has the ability to complete finding optimal or near-optimal
solutions in large search spaces efficiently, which is very suitable for stochastic
collision detection. Furthermore, PSO algorithm also has the advantages that it is easy
An Adaptive Stochastic Collision Detection Between Deformable Objects Using PSO 453
to overcome local optimization problem and process constrained conditions. PSO also
can be implemented with ease and few parameters need to be tuned. In the next
section, we will introduce our notation and give an overview of our approach.
4 Algorithm Overview
where a and b represent two vertexes, and each vertex has a (x, y, z) coordinate in 3-
dimensional object space.
In order to reduce computation, we use the square of particles fitness. From above
design, collision detection is converted to search through a discrete binary space to find
those particles (optimal solution) whose fitness is below a given proximity threshold.
For the case, the optimal solution is single the swarm will probably converge to it
after iterations. But if the optimal solution is a set, such as collisions between
sawtoothed models, or a collision cluster, how can we find them as many as possible?
Here, wed better look back to the settings of search space. At beginning we set it
without taking primitives topology constraints into account, so continuous primitives
in object space may distribute discretely in search space. But that makes our method
more robust for various types of models. Here, we make use of one outside collided
pairs set to record particles those falling below a given proximity threshold in each
iteration. With the iteration number increasing, more places will be searched and
more optima will be found.
Iterations will stop, when maximum iteration number NI is attained. The
maximum iteration within a time step is determined by the Formula(5), which
allows users to balance precision (i.e. completeness) and reaction speed (i.e. small
time step).
T
NI = (5)
N p Cp
T: A time step; NP: the population size of the swarm; CP: one parcel fitness
computation cost.
454 T. Wang et al.
In this section, we give more details of our algorithm in handing collision detection
between deformable models.
During the process of simulation, models might deform in every time steps.
Deformations of models in most cases are of two types: arbitrary vertex repositioning
or splitting. For the former, a model might undergo a complete change of shape, from
one time step to the other, by moving the relative position of all of its vertices. But
during such deformations, the mesh connectivity stays the same, i.e. the mesh is not
torn up in any way. For the latter, a models geometric primitives might change, such
as increasing or decreasing the number or splitting the body into new separated pieces
[13].Our method efficiently handles these two kinds of deformation.
To our algorithm, the deformations of model mean variation of search space
between time steps. This variation can be both or either of the following: position
change of goal or size change of search space.
In the view of the whole simulation, collision detection may be regarded as a
consecutive PSO iteration process from the first time step to the last one. Different
From the normal PSO, however, search space may undergo a change due to the model
deformation after a fixed iteration (a fixed iteration number was deigned in a time
step). This is a dynamic environments problem.
PSO has been proven to be very effective for dynamic problems. In previous work,
resetting particles memory is used to response to the changes [14]. Carlisle and
Dozier investigated using PSO to track a single peak varying spatially [5]. Hu and
Eberhart [15] suggested monitoring environments for a change and updating the
gbest and pbest values of particles when a change is detected.
As we all know, PSO algorithm has a mechanism to adapt itself to environment
changes. The swarm searches for optima in solution space and typically shrink to a
small space and if the changes occur in this area, PSO will probably find new
optimum automatically without any modification [15]. If the changes are wide, we
must take some strategies to response the changes effectively. So in the beginning of
each time step (changes might have took place), a particle update process is added, in
which fitness of all the particles are valued and a portion of the best of them are
reserved. Replace their pbest values by their current X vector. Thus their profits from
previous experiences are retained by their current location and they are forced to
redefine their relationship to the goal. The other particles are allocated to new
positions randomly, that ensure more different place in the search space will be
searched, for the case that the whole population has already converged to a small area,
it might not easy to jump out to follow the changes. In the section 6, we will test
different portion reserved and compare the results.
results depend on the shapes of the models, their relative orientation and the deforma tions
applied. All the tests were done using Pentium IV, 2.4 MHz CPU and Memory, 1G.
For the initial experiment, we elected to explore the effects of taking different
samples in a static scene of two collided hands (shown in Fig.1). We sampled five
groups of primitives, which constituted five search spaces for PSO. It is evident that
with more sample pairs, the precision is better, but the search cost is more. For
example, if we want a 20% detection ratio, it is better to take group 2or 3 then group 4
or 5. If we want a 30% detection ratio the group 3 is the best. If we want a 60%
detection ratio the group 4 is the best. If we want more then 90% detection ratio,
group 5 is the only choice (see Table1). So with the different precision demands, we
can take different sample strategies to get the best result.
We varied the population from 5 to 100 in a time step compared the success rates
(shown in Fig.2).We would expect that, in general, more particles would search more
space, and solutions would then be found sooner ,but we found this to be generally
true in terms of performance, but not in terms of cost. Because of the time critical,
population size and the iteration number are restricted to each other, and when the
population increases, each iteration step represents a greater cost, as more particles
call upon the evaluation function. A population size from 20 to 30 appears to be good
choice. It is small enough to be efficient, yet large enough to produce reliable results.
In the second experiment, two continuously deforming sneaks were moved slowly
towards each other during 50 simulation time steps (shown in Fig.3). We test our
Table 1. Compares of detection ratio of five sample groups for of two collided hands
Fig. 1. Scene 1: Two collided hand models each of 26373 primitives and collided pairs are
white
456 T. Wang et al.
Fig. 2. Detection ratio with varied population size for two collided hands
Fig. 3. Scene 2: Simulation of two deforming sneaks (each of 22236 primitives) during 50 time
steps and the collided pairs are white
and the bounding volume and sphere become rather large and update process costs a
lot of time. Our algorithm obtains speedup due to little number of elementary tests
and little update cost. Therefore, as the result shown in the Fig.6, we achieve up to 2-3
times overall speedup. Our algorithm has a limitation, that it couldnt find all the
collided pairs in most cases. But Kimmerle showed us in [12] that a detection ratio
below 100% is sufficient to ensure a stable simulation. Even for a very challenging
cloth simulation, a response ratio as low as 20% can be enough to get good visual
results. So to our algorithm, if precision of collision detection is not so high, it leads
to a more robust collision detection system.
Fig. 4. Comparison of three kinds of update swarm strategies when deformation is not
drastically
Fig. 5. Comparison of three kinds of update swarm strategies when deformation is drastically
458 T. Wang et al.
7 Conclusion
This paper has proposed a new method of Stochastic Collision Detection for
deformable objects by using PSO Algorithm. Our algorithm samples primitive pairs
within the models to construct a discrete binary search space for PSO, by which user
can balance performance and detection quality. In order to handle the deformation of
models in the object space, a particle update process is added in the beginning of every
time step , which handles the dynamic environments problem in search space caused by
deformation. This approach provides a more comprehensive way to trade-off accuracy
for computation time. Moreover, the swarm can also handle temporal coherence and
efficiently search through the highly large primitive pair search space. At last, we give
the precision and efficiency evaluation about the algorithm and find it might be a
reasonable choice for deformable models in Stochastic Collision Detection.
Acknowledgments
This work was supported by NSFC (No. 60573182).We would like to thank Yan
Zhang for discussion on the validation proof and the paper format and other members
of the Image& Graphics group at JLU provide much useful feedback on our
algorithm.
References
1. Teschner, M., Kimmerle, S., Heidelberger,B.. Collision detection for deformable objects.
Eurographics, State of the Art Report(2005).
2. Uno, S., Slater, M.. The sensitivity of presence to collision response. In Proc. of IEEE
Virtual Reality Annual International Symposium (VRAIS) (1997).
An Adaptive Stochastic Collision Detection Between Deformable Objects Using PSO 459
1 Introduction
Stress is a form of prominence in spoken language. Usually, stress is seen as
a property of a syllable or of the vowel nucleus of the syllable. There are two
types of stress in English. Lexical stress refers to the relative prominences of
syllables in individual words. Rhythmic stress refers to the relative prominences
of syllables in longer stretches of speech than an isolated word. When words are
used in utterances, their lexical stress may be altered to reect the rhythmic (as
well as semantic) structure of the utterance.
As English becomes more and more important as a communication tool for
people from all countries, there is an ever increasing demand for good quality
teaching of English as a Second Language (ESL). Learning English well requires
lots of practice and a great deal of individualised feedback to identify and cor-
rect errors. Providing this individualised feedback from ESL teachers is very
expensive, therefore computer software that could help ESL learners to speak as
a native speaker is highly desirable. Properly placing rhythmic stress is one of
the important steps for teaching ESL students to have good speech production.
Thus to be able to automatically detect the rhythmic stress patterns in stu-
dents speech becomes a really important functionality in this kind of computer
software.
There are a number of prosodic (sometimes referred to as suprasegmental)
features that relate to stress. Thus the perception of a syllable as stressed or
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 460471, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Genetic Programming for Automatic Stress Detection in Spoken English 461
unstressed may depend on its relative duration, its amplitude and its pitch.
Duration is simply how long the syllable lasts. Amplitude relates to the perceived
loudness of the syllable, and is a measure of its energy. Pitch is the perceptual
correlate of the fundamental frequency (F0 ) of the sound signal, i.e. the rate of
vibration of the vocal folds during voiced segments.
A further correlate of stress is the quality of the vowel in a syllable. Vowels
are split into full vowels and reduced vowels in terms of the quality based on the
conguration of the tongue, jaw, and lips [1]. Full vowels tend to be more periph-
eral, and appear in both stressed syllables and unstressed syllables [2]. Reduced
vowels, including /@/ and /I/ in New Zealand English, tend to be more central,
and are only associated with unstressed syllables. Therefore, vowel quality is not
a completely reliable indicator of stress [2].
In order to automatically detect rhythmic stress, prosodic features and vowel
quality features as two main sets of features have been studied by many re-
searchers using machine learning algorithms.
Waibel [3] used duration, amplitude, pitch, and spectral change to iden-
tify rhythmically stressed syllables. A Bayesian classier, assuming multivariate
Gaussian distributions, was adopted and 85.6% accuracy was achieved. Jenkin
and Scordilis [4] used duration, energy, amplitude, and pitch to classify vowels
into three levels of stress primary, secondary, and unstressed. Neural networks,
Markov chains, and rule-based approaches were adopted. The best overall perfor-
mance was 84% by using Neural networks. Rule-based systems performed worse
with 75%. Van Kuijk and Boves [5] used duration, energy, and spectral tilt to
identify rhythmically stressed vowels in Dutch a language with similar stress
patterns to those of English. A simple Bayesian classier was adopted, on the
grounds that the features can be jointly modelled by a N-dimensional normal
distribution. The best overall performance achieved was 68%. Our previous work
[6] used duration, amplitude, pitch and vowel quality to identify rhythmically
stressed vowels. Decision trees and support vector machines were applied and
the best accuracy, 85%, was achieved by support vector machines.
However, the accuracies of the automatic stress detection from the litera-
ture are not high enough to be useful for a commercial system. The automatic
rhythmic stress detection remains a challenge to speech recognition.
Genetic programming (GP) has grown very rapidly and has been studied
widely in many areas since the early 1990s. Conrads et al. [7] demonstrated that
GP could nd programs that were able to discriminate certain spoken vowels
and consonants without pre-processing speech signals. However, there are only
a few studies using GP in the automatic speech recognition and analysis area.
Most current research on automatic rhythmic stress detection uses other machine
learning algorithms rather than GP.
1.1 Goals
P1 : the mean pitch value of the vowel normalised by the mean pitch of the
utterance.
P2 : the pitch value at the start point of the vowel normalised by the mean
pitch of the utterance.
P3 : the pitch value at the end point of the vowel normalised by the mean
pitch of the utterance.
P4 : the maximum pitch value of the vowel normalised by the mean pitch of
the utterance.
P5 : the minimum pitch value of the vowel normalised by the mean pitch of
the utterance.
464 H. Xie, M. Zhang, and P. Andreae
P6 : the dierence between the normalised maximum and minimum pitch val-
ues a negative value indicates a falling pitch and a positive value indicates
a rising pitch.
P7 : the magnitude of P6 , which is always positive.
P8 : the sign of P6 1 if the pitch rises over the vowel segment, -1 if it
falls, and 0 if it is at.
P9 : a boolean attribute 1 if the pitch value at either the start point or
the end point of the vowel segment cannot be detected, otherwise -1.
P10 : a boolean attribute 1 if the vowel segment is too short to compute
meaningful mean, minimum, or maximum values, otherwise -1.
/e/
Extrac t
The
Vowe l
Segme nt
/e/
Encoding
... Parameter
Vector
Recognise
log(S S ) if S
f e e < Sf
Fd =
0log(S S ) ifif SS
e f
e
e
= Sf
> Sf
(2)
Step 6. We also compute a boolean vowel quality feature, T , to deal with cases
where the vowel segment is so short that F or R cannot be calculated.
If the vowel segment is less than 33ms, which is the minimum segment
duration requirement of the HMM recognisers, then the value of this
attribute will be 1. Otherwise, -1. If this value is 1, we set F and R
to 0.
2.3 Functions
The function set contains not only the four standard arithmetic functions, but
also several other arithmetic and trigonometric functions and conditional func-
tions, as shown in equation 5.
Each of the rst four mathematical operators takes a single argument. The abs
function returns the absolute value of the argument. The protected sqrt function
returns the square root of the argument. The cos or sin functions return the
cosine or sine values of the argument respectively. Each of the +, , , and /
operators takes two arguments. They have their usual meanings except that the
/ operator is protected division that returns 0 if its second argument is 0. The
three conditional functions each takes two arguments. The iflt function returns
1 if the rst argument is less than the second one, otherwise 0. The ifpr function
returns the second argument if the rst argument is positive, otherwise does
nothing. The ifnr function returns the second argument if the rst argument
is negative, otherwise does nothing. Note that there is a redundancy in that
the conditional functions could be expressed in terms of each other. There is a
trade o between the increased breadth of the search space resulting from having
redundant functions, and the more complex programs (hence a deeper search of
the search space) resulting from a minimal set of non-redundant functions. We
believe that the smaller programs that are possible with the expanded function
set more than compensates for the broader search space.
Error rate is used as the tness function to evaluate programs. The classication
error rate of a program is the fraction of tness cases in the training data set that
are incorrectly classied by the program. Rhythmic stressed vowel segments and
unstressed vowel segments are both treated as important so that neither class is
weighted over the other. In our data set, class stressed is represented by 1 and
class unstressed is represented by -1. If a programs output is greater than or
equal to 0, then the output is counted as a class stressed output. Otherwise, it
is counted as a class unstressed output.
In this GP system the learning process uses the tournament selection mechanism
with size four and the crossover, mutation and reproduction operators. It is
worth noting that in this GP system, the crossover and mutation operators are
independent in that the mutation operator can be applied regardless of whether
a tournament winner has also been selected for crossover, so that the sum of
the crossover rate and the mutation rate may be more than 100%. The selection
of parameter values used in this study is shown in Table 1. These values were
obtained through prior empirical research. The unusually high mutation rates
were found to be the most helpful for this problem.
The learning/evolutionary process is terminated when either of the following
criteria is met:
The classication problem has been solved on the training data set, that is,
all vowel segments in the training set have been correctly classied, with the
tness of the best program being zero.
Genetic Programming for Automatic Stress Detection in Spoken English 467
3 Experiment Design
The system uses a data set collected by the School of Linguistics and Applied Lan-
guage Studies at Victoria University of Wellington. The date set contains 60 utter-
ances of ten distinct English sentences produced by six female adult NZ speakers,
as part of the NZ Spoken English Database (www.vuw.ac.nz/lals/nzsed). The
utterances were hand labelled at the phoneme level, including the time stamps of
the start and the end of a phoneme segment and the phoneme label. Further, each
vowel was labelled as rhythmic stressed or unstressed. There were 703 vowels in the
utterances, of which 340 are marked as stressed and 363 are marked unstressed.
Prosodic features and vowel quality features of each vowel segment are calculated
from the hand labelled utterances.
Three experiments were conducted on the three terminal sets respectively.
For each terminal set, since the data set was relatively small, a 10-fold cross
validation method for training and testing the automatic rhythmic stress detec-
tors was applied. In addition, the training and testing process was repeated ten
times, that is, 100 runs of training and testing procedures were made in total
for each terminal set. The average classication accuracy of the best program in
each experiment is calculated from the outputs of the 100 runs.
In addition, we investigate whether scaling the feature values in the three
terminal sets to the range [1, 1] results in better performance.
We also compare our GP approach with the C4.5 [10] decision tree (DT)
system and a SVM system (LIBSVM [11]) on the same set of data. The SVM
system uses an RBF kernel and a C parameter of 1.0.
The accuracy of GP was 11.5% and 12.2% higher than that of DT and SVM
respectively on unscaled data, and was 11.0% and 8.4% higher on scaled. There is
little evidence showing any impact of using scaled data on GP and DT. However,
there is an improvement of 3.5% by using scaled data for SVM.
GP DT SVM
Unscaled 91.9 80.4 79.7
Scaled 91.6 80.6 83.2
Terminal Set II. Table 3 shows the experiment results of Terminal Set II. GP
also achieved the best accuracy of 85.4%. The accuracy of GP was 5.7% and
6.3% higher than that of DT and SVM respectively on unscaled data, and was
5.7% and 4.1% higher on scaled. There is also little evidence to show any impact
of scaled data.
Terminal Set III. Table 4 shows the results of Terminal Set III, which combines
all the features used in Terminal Set I and Terminal Set II. Again, the best
accuracy of 92.6% was achieved by GP. GP outperformed DT and SVM by
12.1% and 10.8% on unscaled data respectively, and by 12.6% and 10.6% on
scaled. For all three systems, accuracies on scaled data were invariably higher
than those on the unscaled data but the dierences were very small.
Comparing the results of all three terminal sets, we obtained the following
observations.
On all terminal sets, regardless of whether the data are scaled or unscaled,
accuracy of GP is consistently and signicantly higher than that of DT and
SVM. This indicates that GP is more eective than DT and SVM on the
automatic stress detection problem on our data set.
For all systems, Terminal Set I consistently returns higher accuracies than
terminal set II. This indicates that either prosodic features are more accurate
than vowel quality features, or that vowel quality feature extraction needs
to be further improved.
Maximising the coverage of features (using Terminal Set III) resulted in some
improvement for GP, but not for DT and SVM. Since terminal set III has the
most complete set of features, it is likely that not all of them are necessary
in detecting stress. Therefore the dierence in performance of Terminal Set I
GP DT SVM
Unscaled 85.4 79.7 79.1
Scaled 84.6 78.9 80.5
Genetic Programming for Automatic Stress Detection in Spoken English 469
GP DT SVM
Unscaled 92.0 79.9 81.3
Scaled 92.6 80.1 82.0
and Terminal Set III could be used as an indication of how robust a system
is at handling unnecessary and irrelevant features. Except for GP, both DTs
and SVMs best accuracy scores dropped on Terminal Set III, therefore GP
is the most robust algorithm among the three at handling unnecessary and
irrelevant features on our data set.
The top thirty programs in each run were analysed and the average impact of
each terminal input in programs was computed as a percentage, as shown in
Tables 5 and 6. The impact of a terminal input refers to the change of the
performance of a program if all occurrences of the terminal input are removed
from the program.
Table 5 shows the impact of the prosodic features. The patterns of the im-
pact of prosodic features are similar on both unscaled data and scaled data.
Three broad bands of impact emerged as high (above 5%, including all duration
Unscaled Scaled
Input Average Impact(%) Input Average Impact(%)
D5 27.7 D5 28.2
D3 27.4 D3 16.5
D2 15.1 D4 13.4
D4 13.7 D1 12.9
D1 8.3 D2 7.8
A1 2.3 A2 1.4
A2 1.1 A1 1.4
P2 1.1 P5 0.6
P1 0.7 P3 0.6
P3 0.7 P2 0.6
P8 0.6 P4 0.5
P4 0.4 P1 0.4
P5 0.3 P8 0.4
P10 0.3 P7 0.2
P7 0.3 P10 0.2
P6 0.1 P9 0.2
P9 0.1 P6 0.1
470 H. Xie, M. Zhang, and P. Andreae
Unscaled Scaled
Input Average Impact(%) Input Average Impact(%)
Rr 31.9 Rd 21.0
Rd 20.7 Rr 19.9
Fr 2.8 T 4.5
T 1.5 Fd 0.76
Fd 0.34 Fr 0.38
features), medium (1% to 5%, including amplitude features), and low (under
1% including all pictch features), corrsponding exactly with the three feature
categories - duration, amplitude and pitch. This indicates duration has a bigger
impact than amplitude while pitch has the smallest impact. On both unscaled
and scaled data set, D5 and D3 are ranked as the rst and second, indicating
that normalisations of a duration feature over the average duration of a vowel
category is better than that over the average duration of a vowel type.
The ranking of duration, amplitude and pitch in terms of impact in this study
matches the result in [6]. However, only one experiment was conducted in this
study whereas seven experiments with various combinations of the feature sets
were conducted in [6], where DT and SVM were used. This suggests that: 1)
GP has stronger feature selection ability than DT and SVM on the problem;
2) GP can automatically handle a large number of features; and 3) GP can
automatically select features that are only important to a particular domain.
As shown in Table 6, Rd and Rr have a much larger impact than Fd , Fr , and
T on both unscaled and scaled data. On unscaled data Rd s impact (31.9%) is
larger than Rr s impact (20.7%), whereas on scaled data the two features display
a similar impact. The results suggest that the reduced vowel quality features are
far more useful than full vowel quality features, regardless of whether dierences
or ratios are used.
most important features. If using vowel quality, reduced vowel quality features
are more useful than full vowel quality features.
In future work, we will further analyse the GP programs to understand the
specic relationship amongst the feature terminals and the perceived stressed
and unstressed vowels in order to determine whether the generated GP program
with/without adapting can be applied to any other kind of data sets. We are
also planning to investigate the possibility of having GP automatically perform
higher level normalisations of the prosodic features and calculate vowel quality
features directly from acoustic likelihoods in order to erase the limitation of the
manual pre-process of the features.
Acknowledgment
The data set used in the paper was provided by Dr Paul Warren of the School of
Linguistics and Applied Language Studies at Victoria University of Wellington.
References
1. Ladefoged, P.: Three Areas of experimental phonetics. Oxford University Press,
London (1967)
2. Ladefoged, P.: A Course in Phonetics. third edn. Harcourt Brace Jovanovich, New
York (1993)
3. Waibel, A.: Recognition of lexical stress in a continuous speech system - a pattern
recognition approach. In: Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, Tokyo, Japan (1986) 22872290
4. Jenkin, K.L., Scordilis, M.S.: Development and comparison of three syllable stress
classifiers. In: Proceedings of the International Conference on Spoken Language
Processing, Philadelphia, USA (1996) 733736
5. van Kuijk, D., Boves, L.: Acoustic characteristics of lexical stress in continuous
speech. In: Proceedings of the IEEE International Conference on Acoustics, Speech
and Signal Processing. Volume 3., Munich, Germany (1999) 16551658
6. Xie, H., Andreae, P., Zhang, M., Warren, P.: Detecting stress in spoken English
using decision trees and support vector machines. Australian Computer Science
Communications (Data Mining, CRPIT 32) 26 (2004) 145150
7. Conrads, M., Nordin, P., Banzhaf, W.: Speech sound discrimination with genetic
programming. In: Proceedings of the First European Workshop on Genetic Pro-
gramming. (1998) 113129
8. Francone, F.D.: Discipulus owners manual (2004)
9. Xie, H., Andreae, P., Zhang, M., Warren, P.: Learning models for English speech
recognition. Australian Computer Science Communications (Computer Science,
CRPIT 26) 26 (2004) 323330
10. Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufmann (1993)
11. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines.
http://www.csie.ntu.edu.tw/ cjlin/papers/libsvm.pdf (2003)
12. Koza, J.R.: Genetic Programming On the Programming of Computers by Means
of Natural Selection. MIT Press, Cambridge (1992)
13. Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. Journal of
Machine Learning Research 5 (2004) 845889
Localisation Fitness in GP for Object Detection
Abstract. This paper describes two new tness functions in genetic pro-
gramming for object detection particularly object localisation problems.
Both tness functions use weighted F-measure of a genetic program and
consider the localisation tness values of the detected object locations,
which are the relative weights of these locations to the target object cen-
ters. The rst tness function calculates the weighted localisation tness
of each detected object, then uses these localisation tness values of all
the detected objects to construct the nal tness of a genetic program.
The second tness function calculates the average locations of all the
detected object centres then calculates the weighted localisation tness
value of the averaged position. The two tness functions are examined
and compared with an existing tness function on three object detection
problems of increasing diculty. The results suggest that almost all the
objects of interest in the large images can be successfully detected by all
the three tness functions, but the two new tness functions can result
in far fewer false alarms and spend much less training time.
1 Introduction
As more and more images are captured in electronic form, the need for programs
which can detect objects of interest in a database of images is increasing. For
example, it may be necessary to detect all tumors in a database of x-ray images,
nd all cyclones in a database of satellite images, detect a particular face in a
database of photographs, or detect all tanks, trucks or helicopters in a set of
images [1, 2, 3, 4]. This eld is typically called object detection.
Object detection is an important eld of research in image analysis and pro-
cessing for many modern tasks. It is the process of nding locations of the objects
of interest within images, known as object localisation, and determining the types
or classes of the objects found, known as object classication. Object localisa-
tion is sometimes referred to object detection, as it also involves classifying all
the objects of interest from background. It is noted that object detection is a
dicult task, particularly when the objects of interest are irregular and/or the
background is highly cluttered.
Genetic programming (GP) is a relatively recent and fast developing approach
to automatic programming [5, 6, 7]. In GP, solutions to a problem can be rep-
resented in dierent forms but are usually interpreted as computer programs.
Darwinian principles of natural selection and recombination are used to evolve
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 472483, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Localisation Fitness in GP for Object Detection 473
2 Object Detection
The process for object detection is shown in Figure 1. A raw image is taken and a
trained localiser applied to it, producing a set of points found to be the positions
of these objects. Single objects could have multiple positions (localisations)
found for them, however ideally there would be exactly one localisation per
object. Regions of the image are then cut out at each of the positions specied.
Each of these cutouts are then classied using the trained classier.
This method treats all objects of multiple classes as a single object of in-
terest class for the purpose of localisation, and the classication stage handles
attaching correct class labels. Compared with the single-stage approach [10, 11],
474 M. Zhang and M. Lett
this has the advantage that the training is easier for both stages as a specic
goal is focused on the training of each of the two stages. The rst is tailored
to achieving results as close to the object centres as possible (to achieve high
positional accuracy), while the second is tailored to making all classications
correct (high classication accuracy).
The accuracy of the positions found by the localiser is important, as incorrect
results from this rst stage will need to be handled by the second stage. Any false
alarms from the localisation stage could cause problems with the classication
stage, unless it is able to classify these as background. If the positions found
are not close enough to the object centres then the classication stage will likely
not handle it well.
The object localisation stage is performed by means of a window which sweeps
over the whole image, and for each position extracts the features and passes them
to the trained localiser. The localiser then determines whether each position is
an object or not (i.e. background).
Terminals
(features) GP learning/
Evolutionaryprocess
Detection Results
Evolved
Genetic Programs
Terminals
Object Detection
Test Set (features) (GP Testing)
(Full Image)
Input window
region 1 Mean SD Regions
F3 F4 Circular region 2
region 3
F5 F6 Circular region 3
region 4 F7 F8 Circular region 4
functions takes two arguments. The if function takes three arguments. The rst
argument, which can be any expression, constitutes the condition. If the rst
argument is positive, the if function returns its second argument; otherwise, it
returns its third argument. The if function allows a program to contain a dier-
ent expression in dierent regions of the feature space, and allows discontinuous
programs, rather than insisting on smooth functions.
Construction of the tness functions is the main focus of this paper, which
will be described in the next section.
4 Fitness Functions
4.1 Design Considerations
During the evolutionary process for object detection, we expect that the evolved
genetic programs only detect the objects when the sweeping window is centred
over these objects. However, in the usual case (except the ideal case), these
evolved genetic programs will also detect some objects not only when the
sweeping window is within a few pixels of the centre of the target objects, but
also when the sweeping window is centred over a number of cluttered pieces of
background. Clearly, these objects are not those we expected but are false
alarms.
Dierent evolved genetic programs typically result in dierent numbers of
false alarms and such dierences should be reected when these programs are
evaluated by the tness function.
When designing a tness function for object detection problems, a number of
considerations need to be taken into account. At least the following requirements
should be considered.
Requirement 1. The tness function should encourage a greater number of
objects to be detected. In the ideal case, all the objects of interest in large
images can be detected.
Requirement 2. The tness function should prefer a fewer number of false
alarms on the background.
Requirement 3. The tness function should encourage genetic programs to
produce detected object positions closer to the centres of the target objects.
Requirement 4. For a single object to be detected, the tness function should
encourage programs to produce fewer detected objects (positions) within
a few pixels from the target center.
Requirement 5. For two programs which produce the same number of detected
objects for a single target object but the objects detected by the rst
program are closer to the target object centre than those detected by the
second program, the tness function should rank the rst program better
than the second.
Some typical examples of these requirements are shown in gure 4. In this
gure, the circles are target objects and squares are large images or regions. A
cross (x) represents a detected object. In each of the ve cases, the left gure is
associated with a better genetic program than the right.
Localisation Fitness in GP for Object Detection 477
(1) (2)
where DR, F AR, and F AA are detection rate (the number of small objects
correctly reported by a detection system as a percentage of the total number of
actual objects in the images), false alarm rate (also called false alarms per object,
the number of non-objects incorrectly reported as objects by a detection system
as a percentage of the total number of actual objects in the images), and false
alarm area (the number of false alarm pixels which are not object centres but
are incorrectly reported as object centres before clustering), respectively, and
A, B, C are constant weights which reect the relative importance of detection
rate versus false alarm rate versus false alarm area.
Basically, this tness function has considered requirement 1, and partially con-
sidered requirements 2 and 4, but does not take into accounts of requirements 3
and 5. Although this tness function performed reasonably well on some prob-
lems, it still produced many false alarms and the evolutionary training time
was still very long [10]. Since this method used clustering before calculating the
tness, we refer to it as clustering based tness, or CBF for short.
To avoid a very large false alarm rate (greater than 100% for dicult problems)
in the training process, we use precision and recall, both of which have the range
between [0, 1], to construct the new tness functions. Precision refers to the
number of objects correctly localised/detected by a GP system as a percentage
of the total number of object localised/detected by the system. Recall refers to
478 M. Zhang and M. Lett
N Li
j=1
localisationFitness (xij ,yij )
i=1 Li
WR1 = (4)
N
2 WP1 WR1
tnessLF W F = (5)
WP1 + WR1
where N is the total number of target objects, (xij , yij ) is the position of the j-th
localisation of object i, Li is number of localisations made to object i, WP1 and
WR1 are the weighted precision and recall, and tnessLF W F is the localisation
tness weighted F-measure, which is used as the rst new tness function.
Localisation Fitness in GP for Object Detection 479
Our second tness function is designed by combining the idea behind the clus-
tering based tness function into our rst new weighted tness function. Rather
than using the averaged localisation tness values for all localisations for each
object in LFWF, we calculate the average positions of all the localisations for
each object rst then use the localisation tness of the averaged position for all
localisations to calculate the weighted precision and recall, as shown in equa-
tions 6, 7, and 8. We refer to this tness function as APWF, as it is based on
the average position weighted F-measure.
N Li
xij
Li
yij
i=1 localisationFitness( j=1
Li , j=1
Li ) Li
WP2 = N (6)
i=1 Li
N Li
xij
Li
yij
i=1 localisationFitness(
j=1
Li , j=1
Li )
WR2 = (7)
N
2 WP2 WR2
tnessAP W F = (8)
WP2 + WR2
We used three image data sets of New Zealand 5 and 10 cent coins in the exper-
iments. Examples are shown in Figure 5. The data sets are intended to provide
object localisation/detection problems of increasing diculty. The rst data set
480 M. Zhang and M. Lett
(easy) contains images of tails and heads of 5 and 10 cent coins against an almost
uniform background. The second (medium diculty) is of 10 cent coins against
a noisy background, making the task harder. The third data set (hard) contains
tails and heads of both 5 and 10 cent coins against a noisy background.
We used 24 images for each data set in our experiments and equally split them
into three sets: a training set for learning good genetic programs, a validation
set for monitoring the training process to avoid overtting, and a test set to
measure object detection performance.
To give a fair comparison for the three tness functions., the localisation
recall (LR) and precision (LP) were used to measure the nal object detection
accuracy on the test set. LR is the number of objects with one or more correct
localisations within the localisation tness radius at the target object centres
as a percentage of the total number of target objects, and LP is the number of
correct localisations which fall within the localisation radius at the target object
centres as a percentage of the total number of localisations made. In addition,
we also check the Extra Localisations (ExtraLocs) for each system to measure
how many extra localisations were made for each object. The training eciency
of the systems is measured with the number of training generations and the CPU
(user) time in second.
We used a population of 500 genetic programs for evolution in each experiment
run. The reproduction rate, crossover rate and mutation rate were 5%, 70% and
25%, respectively. The program size was initialised to 4 and it could increase to
8 during evolution. The system run 50 generations unless it found a solution,
in which case the evolution was terminated early. A total number of 100 runs
were performed for each tness function on each data set and average results are
presented in the next section.
6 Results
Table 1 shows the results of the GP systems with the three tness functions. The
rst line shows that, for the easy data set, the GP system with the existing tness
function, CBF , achieved an average localisation recall 99.99% and an average
Localisation Fitness in GP for Object Detection 481
As shown in the gure, the clustering based tness function CBF resulted in
a huge number of extra localisations for all the 15 objects detected. The LFWF
method, however, only resulted in a small number of extra localisations. Although
the APWF method produced more localisations than the LFWF method, it was
much better than the CBF method in producing extra localisations. These maps
conrm that the two new tness functions were more eective than the clustering
based tness function on these problems and the LFWF performed the best.
7 Conclusions
This paper described two new tness functions in genetic programming for object
detection, particularly object localisation problems. Rather than using a clus-
tering process to determine the number of objects detected by the GP systems,
the two new tness functions introduced a weight called localisation tness to
represent the goodness of the detected objects and used weighted F-measures.
The rst tness function LFWF calculated the weighted localisation tness of
each detected object, then used these localisation tness values of all the de-
tected objects to construct the nal tness measure of a genetic program. The
second tness function APWF calculated the average locations of all the de-
tected object centres then calculated the weighted localisation tness value of
Localisation Fitness in GP for Object Detection 483
the averaged position. The two tness functions were examined and compared
with an existing tness function on three object detection problems of increasing
diculty. The results suggest that the two new tness functions outperformed
the existing clustering based tness function in terms of both detection accuracy
and training eciency. The LFWF approach achieved the better performance
than the APWF approach.
In the future, we will apply the new approach to other object detection prob-
lems including object overlapping situations. We will also investigate new ways
for further reducing training time by eectively organising training examples.
References
1. Gader, P.D., Miramonti, J.R., Won, Y., Coeld, P.: Segmentation free shared
weight neural networks for automatic vehicle detection. Neural Networks 8 (1995)
14571473
2. Roitblat, H.L., Au, W.W.L., Nachtigall, P.E., Shizumura, R., Moons, G.: Sonar
recognition of targets embedded in sediment. Neural Networks 8 (1995) 12631273
3. Roth, M.W.: Survey of neural network technology for automatic target recognition.
IEEE Transactions on neural networks 1 (1990) 2843
4. Waxman, A.M., Seibert, M.C., Gove, A., Fay, D.A., Bernandon, A.M., Lazott, C.,
Steele, W.R., Cunningham, R.K.: Neural processing of targets in visible, multi-
spectral ir and sar imagery. Neural Networks 8 (1995) 10291051
5. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: An
Introduction on the Automatic Evolution of computer programs and its Applica-
tions. Morgan Kaufmann Publishers; Heidelburg (1998)
6. Koza, J.R.: Genetic programming : on the programming of computers by means
of natural selection. Cambridge, Mass. : MIT Press, London, England (1992)
7. Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs.
Cambridge, Mass. : MIT Press, London, England (1994)
8. Song, A., Ciesielski, V., Williams, H.: Texture classiers generated by genetic
programming. In Proceedings of the 2002 Congress on Evolutionary Computation,
IEEE Press (2002) 243248
9. Tackett, W.A.: Genetic programming for feature discovery and image discrimina-
tion. In Proceedings of the 5th International Conference on Genetic Algorithms,
Morgan Kaufmann (1993) 303309
10. Zhang, M., Andreae, P., Pritchard, M.: Pixel statistics and false alarm area in
genetic programming for object detection. In Applications of Evolutionary Com-
puting, LNCS Vol. 2611, Springer-Verlag (2003) 455466
11. Zhang, M., Ciesielski, V., Andreae, P.: A domain independent window-approach
to multiclass object detection using genetic programming. EURASIP Journal on
Signal Processing, 2003 (2003) 841859
12. Smart, W., Zhang, M.: Classication strategies for image classication in genetic
programming. In Proceeding of Image and Vision Computing Conference, Palmer-
ston North, New Zealand (2003) 402407
13. Howard, D., Roberts, S.C., Brankin, R.: Target detection in SAR imagery by
genetic programming. Advances in Engineering Software 30 (1999) 303311
14. Bhowan, U.: A domain independent approach to multi-class object detection using
genetic programming. BSc Honours research project, School of Mathematical and
Computing Sciences, Victoria University of Wellington (2003)
Immune Multiobjective Optimization Algorithm
for Unsupervised Feature Selection
1 Introduction
Intuitively, more features can describe a given target better and are in favor of the
improvement of the discriminating performance. However, it is not the case in
practice. One may extract the potentially useful features for many learning domains.
However, some of the features may be redundant or irrelevant, and some may even
misguide the learning results. In such a case, removing these noise features will
often lead to better performance.
Feature selection is defined as the process of choosing a subset from the original
predictive variables by eliminating redundant features and those with little or no
predictive information. In supervised learning, it is a popular technique and has been
used in various applications because it can improve the performance of classifiers and
lower or even avoid the dimension curse. According to the evaluation criterions,
feature selection algorithms can be divided into filters and wrappers [1]. The former
performs feature selection independently of any learning algorithm. On the other
hand, the candidate feature subset is evaluated by the classification accuracy in
wrappers.
Compared with supervised learning, only in recent years, some investigations on
feature selection for unsupervised leaning have been made gradually. The unsupervised
feature selection algorithms also can be classified into filter and wrapper mehtods.
Distance entropy based algorithm [2] and the method based on feature similarity [3] are
examples of filters. Most researches on unsupervised feature selection belong to
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 484 494, 2006.
Springer-Verlag Berlin Heidelberg 2006
Immune Multiobjective Optimization Algorithm for Unsupervised Feature Selection 485
wrappers. In [4], mixture models are used to implement feature selection and clustering
simultaneously. In [5], a separation criterion is used for feature evaluation, and the
minimum description length penalty term is added to the log-likelihood criterion to
determine the number clusters. The multiobjective genetic algorithm and an evolutionary
local selection algorithm are used in [6] and [7] respectively.
In this paper, an unsupervised feature selection algorithm based on wrapper frame
is proposed, in which FCM algorithm is applied to form the clusters based on the
selected features. Generally, a feature selection algorithm involves two sides: a fea-
ture evaluation criterion and a search algorithm. A validity measure for fuzzy cluster-
ing is applied to evaluate the quality of the features for FCM algorithm. In addition,
suitable feature selection and the pertinent number of clusters must be optimized at
the same time in unsupervised feature selection since different feature subsets may
lead to different cluster structures. The task presents a multi-criterion optimal prob-
lem. Recently, evolutionary computation algorithms get widely applications in feature
selection [8] and synthesis [9] [10] to improve the performance and reduce the feature
dimension as well. In paper [11], the Immune Clonal Algorithm (ICA) is applied to
search for the optimal feature subset for classification in which different objectives
are combined into a single objective function. The main drawback of using ICA here
lies in that it is difficult to explore different possibilities of trade-offs among objec-
tives. Then the Immune Forgetting Multiobjective Optimization Algorithm (IFMOA)
[12] is used to find a set of nondominated solutions which imply the more discrimi-
nate features and the more pertinent number of clusters.
dij = ij X i Z j (1)
where is the usual Euclidean norm. Thus dij is just the Euclidean distance between
X i and Z j weighted by the fuzzy membership of data i belonging to cluster j.
486 X. Zhang et al.
For each cluster j, the summation of the squares of fuzzy deviation of each data
point is called the variation of cluster j, which is denoted by 2j and is defined as
2j = i =1 (d ij ) 2 = i =1 ij2 X i Z j
n n 2
. (2)
= j =1 2j .
c
(3)
Consequently, the compactness of the fuzzy c-partition of the data set is defined as the
ratio of the total variation to the size of the data set / n . The smaller the ratio is, the
more compact the clusters are. And 2j n j is called the compactness of cluster j
where n j = i =1 ij is the fuzzy cardinality of cluster j.
n
On the other hand, the separation of the fuzzy c-partition is defined as:
2
d min = min i , j =1,L, c ,i j Z i Z j . (4)
d min measures the minimum distance between cluster centroids. Larger d min indicates
that all the clusters are separated.
Then, the compactness and the separation validity function or Xie-Beni index is de-
fined as
n
XB = . (5)
d min
A smaller XB indicates a partition in which the clusters are overall compact, and
separate to each other. Thus, our goal in this paper is to find the feature subset for
FCM with the smallest XB .
It is noted that, in FCM algorithm, the objective function is
J q = i =1 j =1 ijq X i Z j
n c 2
. (6)
In literatures, various criterions are used for finding the number of clusters, for ex-
ample, Bayesian Information Criterion [14], minimum message length is used in [4],
and a minimum description length penalty is used in [5]. Instead of estimating the
number of clusters, c, we get it by encoding it as a part of the individual solution
along with the feature selection. Then each individual in Immune Multiobjective
Optimization Algorithm represents a feature subset and a number of clusters.
A validity measure for clustering on the given feature subset is needed first. From
section 2, it is found that a smaller XB indicates a partition in which the clusters are
overall compact, and discriminated from each other. So we minimize the following
objective function.
f1 = XB . (7)
However, as Xie and Beni pointed in [13], the criterion above prefers larger num-
ber of clusters. In order to compensate for the previous criterions bias to the number
of clusters, then we minimize the second objective.
f2 = c (8)
where c is the number of selected clusters.
The third objective is to find the feature subset with the minimum number of se-
lected features d when the other criterions are equal.
f3 = d . (9)
Then, IFMOA, capable of maintaining a diverse population of solutions and result-
ing in global Pareto optimal solutions by multiobjective, is used to deal with the opti-
mization problem with above 3 objectives.
488 X. Zhang et al.
Three problems must be solved first in the application of IFMOA to the unsupervised
feature selection: encoding strategy, the construction of the affinity function to evalu-
ate the performance of every individual, and the determinations of immune operators
and the parameters.
4.3.1 Encoding
The decimal encoding is used and the length of an antibody is D+1, where D is the
number of features. The initial antibody population A(0) is generated randomly at
between 0 and 1. And each one of N p (population size) antibodies comprises two
parts, the feature saliency for each feature and the encoding for the number of clus-
ters. Let (av1 , av2 ,L , avD , avD+1 ) denote an antibody, where avi ,..., avD encode the feature
saliency values of the associated features and avD+1 denotes the number of clusters.
Immune Multiobjective Optimization Algorithm for Unsupervised Feature Selection 489
Features with large saliencies are kept to constitute a suitable feature subset. And
optimizing feature saliency values naturally leads to feature selection. The number of
clusters can be obtained by the following decoding method:
c = round (cmin + (cmax cmin )avD+1 ) where cmin is the minimal number of clusters,
cmin = 2 since one cluster for a data set is meaningless, and cmax is the preestablished
maximal number of clusters. round (a) rounds the elements of a to the nearest integers.
In the evolutionary, the feature saliency value is likely to become small if one of
the input features is unimportant for the clustering. Consequently, when the feature
saliency value becomes small enough, it means that it is possible to remove the corre-
sponding feature without sacrificing the performance of the clustering. In our work,
all the feature saliency values av , i = 1, 2,L, D denoted by each individual are normal-
i
ized first and let i =1 av = 1 . The threshold is set to be 1/ D and the features whose
D
i
feature saliency values are lower than the threshold are discarded.
applied to A(k ) in order to save the information of original antibody population. Thus the
population after the affinity maturation is composed of the mutated population Z (k ) and
the original population A(k ) . Finally, clonal selection is used to update the antibody
population. In this case, preference is given to the individual with the higher affinity
between the original antibody and its qi copies after mutation. CSO reproduces antibod-
ies and selects their improved progenies, so each individual will be optimized locally and
the newcomers yield a broader exploration of the search space.
Updating Antibody Archive
In the iterative of IFMOA, the antibody archive stores the nondominated solutions of
current combinational population. Such external population is the elitist mechanism
most commonly used in multiobjective optimization, and it allows us to move towards
the true Pareto front of the multiobjective problem. We set the antibody archive not to
exceed a fixed size N f , and assume P ( k + 1) is its actual size. If P ( k + 1) p N f , copy
(N f )
P ( k + 1) dominated individuals having the best affinity from the current com-
To evaluate the performance of the proposed method, a synthetic data set is used first,
which consists of 400 data points and 6 features. With the first two significant fea-
tures, the points form four well defined clusters as shown in the first figure of Fig.3,
which are formed by generating points from a pseudo-Gaussian distribution. Features
3 and 4 are got by the similar way with feature 2. Features 5 and 6 are independently
sampled from uniform distribution. Fig.3 illustrates this data set by projecting the
points onto the feature subspaces with two dimensions.
In the experiment, the length of the individual is 9. The first 8 decimal numbers en-
code the features, while the remaining one for the number of clusters. In IFMOA, the
population size is 20, the probability of mutation is 1/9, and the clonal forgetting
proportion T % is set to be 0.08. Antibody archive size N f = 100 and clonal scale nc = 5 .
The termination criterion is triggered whenever the maximum number of generations
50 is attained. The maximal number of clusters is set to be 8.
Immune Multiobjective Optimization Algorithm for Unsupervised Feature Selection 491
The second experiment is implemented to test the algorithm on the real data sets as
shown in Table 2 from (http://www.ics.uci.edu/~mlearn/MLSummary.html) UCI ma-
chine learning repository. All the data sets are normalized beforehand.
In the experiment, each data set was first randomly divided into two halves: one for
feature selection, another for testing. According to the selected features and the num-
ber of clusters on the former set by the proposed algorithm, each data point in the test
set is assigning to the cluster that most likely generated it. We evaluate the results by
interpreting the components as clusters and compare them with the ground truth la-
bels. In each time, a final solution is selected among all the Pareto solutions in terms
of XB index. In IFMOA, except the length of individual and the mutation are relevant
492 X. Zhang et al.
to the dimension of data set, the values of the other parameters are same as those in
experiment 1. The method is denoted as FSFCM-IFMOA in table 3.
For each data set, we perform the proposed method 10 runs and record the average
error rate and the standard deviation on the test set. For comparison, we combine the
three objectives in section 4.1 into one objective function by a random weighted
summation, and the suitable feature subset and the number of clusters are achieved by
minimizing the objective function using Immune Clonal Algorithm (ICA) [15]. The
function is defined as F = 1 f1 + 2 f 2 + 3 f3 , in which i is randomly generated in each
evaluation and i =1 i = 1 . In ICA, the encoding strategy and the parameters including
3
the population size, the probability of mutation, the clonal scale and the termination
criterion are set the same as those in IFMOA. FSFCM-ICA is used for denoting the
method and the average results of 10 runs are recorded in Table 3. Furthermore, we
also ran FCM 10 times using all the features with the fixed number of clusters.
Table 3. Comparison of the error rates and the standard deviations of data sets with and without
feature selection
From the results, we can find that the proposed method improved the performance
in terms of the error rate when compared with using all the features and the fixed
number of clusters. The improvement is more significant for the Ionospere data set.
However, it is also shown that the dimension of the selected feature subset is high. It
may be due to the fact that the final solution of each run is selected among all the
Pareto solutions in terms of XB index since the performance is our preference. In addi-
tion, the value of the threshold for feature saliencies will influence the performance of
Immune Multiobjective Optimization Algorithm for Unsupervised Feature Selection 493
feature selection also when the decimal encoding strategy is used and the number of
selected features is unknown. It is also shown that the error rates of the FSFCM-ICA
with randomly weighting are lower than the proposed method in our experiments.
6 Conclusion
In this paper, we proposed an unsupervised feature selection algorithm based on Im-
mune Forgetting Multiobjective Optimization Algorithm. Unsupervised feature selec-
tion is considered as a combinational optimization problems for minimize the number
of features used and the criterion for measuring the performance of clustering. At the
same time, different feature subsets lead to varying cluster structures, namely, the
number of clusters is relevant to the feature subset considered. Instead of combining
these evaluations into one objective function, we make use of the multiobjective im-
mune clonal algorithm with forgetting strategy to find the more discriminant features
for clustering and the pertinent number of clusters. The results of experiments on
synthetic data and real data sets show the potential of the method. Certainly, we just
get some primary results and the application of the method on the segmentation of
remote sensing image is our focus of further study.
Acknowledgements. The authors would like to thank the anonymous reviewers for
valuable suggestions that helped to greatly improve this paper. This work was
supported by the National Grand Fundamental Research 973 Program of China under
Grant No.2001CB309403 and the Chinese National Natural Science Foundation
under Grant No. 60372045, 60372050, 60133010 and 60472084.
References
1. Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence
Journal. 97 (1997) 273324
2. Dash, M., Choi, K., Scheuermann,P., Liu, H.: Feature Selection for Clustering - A Filter
Solution, In: Proceedings of 2nd IEEE International Conference on Data Mining
(ICDM'02). (2002) 115122
3. Mitra P., Murthy, C.A.: Unsupervised Feature Selection Using Feature Similarity. IEEE
Trans. Pattern Analysis and Machine Intelligence. 24(2002): 301312
4. Law, M.H.C., Figueiredo, Ma rio A.T., Jain, A.K.: Simultaneous Feature Selection and
Clustering Using Mixture Models. IEEE Trans Pattern Analysis and Machine Intelligence.
26(2004): 11541166
5. Dy, J.G., Brodley, C.E., Kak, A., Broderick, L.S. and Aisen, A.M.: Unsupervised Feature
Selection Applied to Content-Based Retrieval of Lung Images. IEEE Trans. Pattern Analy-
sis and Machine Intelligence. 25(2003): 373378
6. Morita, M., Sabourin, R., Bortolozzi, F., and Suen, C. Y.: Unsupervised Feature Selection
Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition. In: Pro-
ceedings of the Seventh International Conference on Document Analysis and Recognition.
(2003): 666670
494 X. Zhang et al.
7. Kim, Y.S., Street, W.N. and Menczer, F.: Feature Selection in Unsupervised Learning via
Evolutionary Search. In: Proc. 6th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. (2000) 365369
8. Yang, J., Honavar, V.: Feature Subset Selection Using a Genetic Algorithm. IEEE Trans.
on Intelligent Systems. 13 (1998) 4449
9. Li, R., Bhanu, B., Dong, A.: Coevolutionary Feature Synthesized EM Algorithm for Image
Retrieval. In: Proceedings of the Application of Computer Multimedia. (2005) 696705
10. Lin, Y., Bhanu, B.: Evolutionary Feature Synthesis for Object Recognition. IEEE Trans.
on Systems, Man, and CyberneticsPart C. 35 (2005) 156171
11. Zhang, X.R, Shan, T, Jiao, L.C.: SAR Image Classification Based on Immune Clonal Fea-
ture Selection. In: Aurlio, C.C, Mohamed, S.K. (Eds.): Proceedings of Image Analysis
and Recognition. Lecture Notes in Computer Science, Vol. 3212. Springer-Verlag, Berlin
Heidelberg New York (2004) 504511
12. Lu, B., Jiao, L.C., Du, H.F., Gong, M.G.: IFMOA: Immune Forgetting Multiobjective Op-
timization Algorithm. In: Lipo Wang, Ke Chen, and YewS. Ong(Eds.): Proceedings of the
First International Conf. on Advances in Natural Computation. Lecture Notes in Computer
Science, Vol. 3612. Springer-Verlag, Berlin Heidelberg New York (2005) 399408
13. Xie, X.L., Beni, G.: A Validity Measure for Fuzzy Clustering. IEEE Trans. Pattern Analy-
sis and Machine Intelligence. 13 (1991) 841847
14. Fraley, C., Raftery, A.E.: How Many Clusters? Which Clustering Method? Answers Via
Model-Based Cluster Analysis. The Computer Journal. 41(1998) 578588
15. Du, H.F., Jiao, L.C., Wang S.A.: Clonal Operator and Antibody Clone Algorithms. In:
Proceedings of the First International Conference on Machine Learning and Cybernetics.
Beijing. (2002) 506510
Classifying and Counting Vehicles
in Traffic Control Applications
Abstract. This paper presents a machine learning system to handle traffic con-
trol applications. The input of the system is a set of image sequences coming
from a fixed camera. The system can be divided into two main subsystems: the
first one, based on Artificial Neural Networks classifies the typology of vehicles
moving within a limited image area for each frame of the sequence; the second
one, based on Genetic Algorithms, takes as input the frame-by-frame classifica-
tions and reconstructs the global traffic scenario by counting the number of vehi-
cles of each typology. This task is particularly hard when the frame rate is low.
The results obtained by our system are reliable even for very low frame rate (i.e.
four frames per second). Our system is currently used by a company for real-time
traffic control.
1 Introduction
The problem of counting and classifying in an automatic, reliable and efficient way
the vehicles that travel along an urban street or a motorway is becoming more and
more important to maintain traffic safety and fluency (see for instance [7, 1, 8, 3, 13]).
Companies are beginning to employ systems based on filmed sequences, since they
appear to be more flexible, more maintainable and less expensive than systems based
on technological supports, such as sensors or other analogous devices that have to be
physically installed in many different places on the street. While, in order to contain
costs, some companies use less sophisticated devices, which produce image sequences
at a lower frame rate, analyzing low frame rate sequences by means of computer sys-
tems can be very difficult. Several commercial systems for traffic surveillance based on
filmed sequences exist (see among the others CVCS, by Alvarado Manufacturing [9],
CCATS by Traficon and AUTOSCOPE by Econolite). However these systems seem to
be based on high frame rate image sequences. In [2], Eikvil and Huseby pointed out
the main limitations of these systems and proposed a new system based on Hidden
Markov Models (HMM). As pointed out in [12, 13], Eikvils and Husebys system has
excellent performances when applied to high frame rate image sequences, but it fails to
correctly reconstruct the traffic scenario in the presence of low frame rate ones. In [12]
a good discussion on the motivations for which a system based on HMM has poor per-
formances on low frame rate image sequences can be found. In this paper, we present
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 495499, 2006.
c Springer-Verlag Berlin Heidelberg 2006
496 F. Archetti et al.
a system for traffic control which is able to work in real-time on low frame rate im-
age sequences. This work is a part of a joined research project between the University
of Milano-Bicocca and the Comerson company (http://www.comerson.com). The main
goal is recognizing and counting different types of vehicles that have traveled along a
street in a given time frame. In order to calculate statistics about the size of vehicles
traveling and thus propose strategies to increase traffic safety and efficiency, we distin-
guish three types of vehicles: cars, trucks and motorbikes. Our system is composed by
two modules. The first one, called Vehicle Classifier System, is based on a standard Feed
Forward Neural Network (NN) trained with the Backpropagation learning rule [5, 11].
The second one, called Vehicle Counting System, is based on Genetic Algorithms (GAs)
[6, 4]. This paper is organized as follows: in section 2, the Vehicle Classifier System is
presented and some experimental results are discussed. Section 3 describes the Vehicle
Counting System Experimental results are presented also for it. Finally, section 4 offers
our conclusions and hints for future work.
This system takes as input a sequence of images (or frames). These images are inter-
preted as matrix of pixels, where every pixel can take values over the range [0, 255],
corresponding to the various tonalities of gray (0 = white, 255 = black). The analysis of
a sequence of consecutive images (or frames) allows us to extract the background with
the same technique as the one used in [2]. Subsequently, a number of virtual sensors (or
detectors) of rectangular shape, arranged in a grid structure, are placed in some limited
regions of the street. For every detector Si composing the grid (1 i 36 in figure) let
s(i,k) be the sum of the squared pixel-by-pixel differences between frame k (the running
image) and the background. The training set that is supplied as input to the classifier
can then be represented as a set of sequences s(1, j) , s(2, j) ..., s(36, j) ,C j , with 1 j N,
where N is the number of frames in the image sequence. and, for each j, the value of C j ,
is interpreted as the class corresponding to the pattern s(1, j) , s(2, j) , ..., s(36, j) . This class
can be: c, t, M, e or m if a car, a truck, a motorbike, no vehicle or more than one vehicle
is present on the grid of detectors, respectively. This system demands a training phase
in which a human being supplies the vehicle class contained into the grid of detectors at
each instant. Here, we present some experiments on the three different street scenarios
(whose names are Noranco, Bollate and Saragozza). Results obtained by k-folds cross-
validation are reported in table 1 (see for instance [10] for a definition of the cci, p and
r measures and for a description of the k-folds cross-validation method).
The output produced by the Vehicle Classifier System is a sequence like G1 , G2 ..., Gn ,
where i [1, n], Gi {C, T, M, m, e}. A sequence of identical symbols, like for in-
stance (C,C,C,C,C,C). may represent: (1) the presence of the same car on the grid for
six consecutive frames, or (2) the presence of a number of different cars in the different
frames. The second of these events can happen because of the low frame frequency of
Classifying and Counting Vehicles in Traffic Control Applications 497
Table 1. Results of the Vehicle Classifier System on three different image sequences. Each row of
this table corresponds to a different sequence. Names of the sequences are reported in column 1.
Column 2 shows results of the cumulative correctly classified instances (cci) for all the vehicles.
Columns 3 to 12 report results of precision (p) and recall (r) for the classes Car, Truck, Motorbike,
Empty and Mix respectively.
our image sequences. The task of counting the vehicles strongly depends on the iden-
tification of which one of these events happened. The Vehicle Counting System can be
partitioned into three phases: preprocessing, GA and statistics (and model) generation.
These phases are described below.
Genetic Algorithm. Let V = [V0 ,V1 , ...,Vk1 ] be the output of the preprocessing phase.
The GA associates V with a population of individuals of the form I = [N0 , N1 , ..., Nk1 ],
such that: (1) i [0, k 1], Ni is the number of different vehicles of type Vi that have
passed in the ith sequence. (2) i [0, k 1], 1 Ni Li , where Li is the length of
the substring of the S sequence pointed by Vi . The fitness of GA individuals is cal-
culated by supervisioning a portion of the image sequence, i.e. by manually count-
ing how many cars, trucks and motorbikes pass through the grid of detectors. Let
X j be the number of cars, Y j be the number of trucks and Z j be the number of mo-
torbikes that really passed through the grid (obtained by means of this supervision)
during a given subsequence j. The fitness of an individual I = [N0 , N1 , ..., Nk1 ] is:
h
1 1 1
f (I) = ( |X j Ni | + |Y j Ni | + |Z j Ni |), where
j=1 X j i:V [i]=C
Y j i:V [i]=T
Z j i:V [i]=M
h is the total number of subsequences the image sequence has been partitioned into.
Normalizations of each member of the sum have been done in order to give the same
importance to each vehicle independently from the number of instances of that vehicle
which have been observed. Standard crossover operator can be used, while the muta-
tion works by replacing the old value of Ni by a uniformly generated random number
included in [1, Li ].
498 F. Archetti et al.
Statistics and model generation. Let S be the sequence produced by the Vehicle Clas-
sifier System, V the one produced by the preprocessing phase and W the one returned
by the GA. The model of our system is generated from the output of the GA, by con-
sidering the rates of all the results assigned to each subsequence of the same number of
identical symbols. For instance, let V contain a C symbol in positions 2, 5 and 11. Let
each one these three C symbols in V point to a sequence of 4 consecutive C symbols
in S. Furthermore, let no other C symbol in V point to a sequence of 4 consecutive C
symbols in S. Finally W contain the numbers 2 in positions 2 and 11 and 1 in position
5. Then, a consecutive sequence of 4 C symbols will be considered by our system as 2
cars with probability 2/3 and as 1 car with probability 1/3.
Experimental Results. The performance of the Vehicle Counting System has been
tested on the same street scenarios as the ones described in section 2. The learning phase
has always been done on a three minutes long image sequence of the same scenario as
the one used during the tests. In that phase, the GA runs have been executed with the fol-
lowing parameters: population size: 100, crossover rate: 95%, mutation rate: 0.001%,
tournament selection of size 10, chromosome length depending on the size of the V
sequence produced by the preprocessing phase. Results are reported in table 2, where
Table 2. Results of the Vehicle Counting System for the same image sequences as in section
2. Sequences have been partitioned into two subsequences. Both subsequences and the whole
sequences have been tested. Each line is related to one of these subsequences. Column 2 reports
the set of frames corresponding to each subsequence. Columns 3 to 8 report the real number
and the counted number of vehicles for the three typologies considered here (cars, trucks and
motorbikes).
the true values of the numbers of cars, trucks and motorbikes passed through the street
(values counted by hand) are shown next to the values counted by the Vehicle Count-
ing System. Each image sequence has been divided into two subsequences of frames.
The frame numbers corresponding to each subsequence is shown in column two. For
each image sequence, experiments have been performed on the whole sequence and on
the two subsequences. As the table shows, the numbers of cars, trucks and motorbikes
counted by the Vehicle Counting System very well approximate the true values for all
the cases considered.
Classifying and Counting Vehicles in Traffic Control Applications 499
A new system for real-time classifying and counting vehicles has been presented in
this paper. It is based on the analysis of low frame rate image sequences of a street
scenario over a given time frame, taken by a fixed camera. It is composed by two sub-
systems: one, based on Feed Forward Neural Networks, for classifying the typology
of the moving vehicles. The other, based on Genetic Algorithms (GAs), for counting
them. Results shown in this paper are encouraging. Future work includes studying how
the preprocessing performances can affect the reliability of the whole system.
References
1. N. Papanikolopoulos E. Wahlstrom, O. Masoud. Monitoring driver activities, Report
no. CTS 04-05. Intelligent Transportation Systems Institute. Internal Report. 2004.
http://www.its.umn.edu/research/complete.html.
2. L. Eikvil and R. B. Huseby. Traffic surveillance in real-time using hidden
markov models, Scandinavian Conference on Image Analysis, SCIA01. 2001.
http://citeseer.ist.psu.edu/eikvil01traffic.html.
3. G. Giuliano, J. Heidemann, M. Hu, F. Silva, and X. Wang. Rapidly deployable sensors for
vehicle counting and classification, 2004. Internal report of the I-LENSE: ISI Laboratory for
Embedded Networked Sensor Experimentation.
4. D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, 1989.
5. S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice-Hall, London, UK,
1999.
6. J. H. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan
Press, Ann Arbor, Michigan, 1975.
7. J. Hourdakis, T. Morris, P. Michalopoulos, and K. Wood. Advanced portable wireless mea-
surement and observation station, Report no. CTS 05-07. Intelligent Transportation Systems
Institute. Internal Report. 2005. http://www.its.umn.edu/research/complete.html.
8. V. Kwigizile, M. F. Selekwa, and R. Mussa. Highway vehicle classification by probabilistic
neural networks. In V. Barr and Z. Markov, editors, Proceedings of the Seventeenth Inter-
national Florida Artificial Intelligence Research Society Conference, Miami Beach, Florida,
USA. AAAI Press, 2004.
9. ALVARADO Manufacturing. Computerized vehicle counting system, 2005. Internal report.
10. T. Mitchell. Machine Learning. McGraw Hill, New York, 1996.
11. E. Oja. Principal component analysis. In Michael A. Arbib, editor, Handbook of Brain
Theory and Neural Networks, pages 753756. MIT Press, 1995.
12. Consorzio Milano Ricerche. Reti neurali per la classificazione di immagini e il riconosci-
mento di veicoli, 2005. Status Report Progetto Comerson. Internal Report.
13. L. Vanneschi and F. Archetti. Resolving temporal sequences for traffic control applications
with genetic algorithms. In Proceedings of the 7th International Conference on Artificial
Evolution (EA 05), 2005.
A Neural Evolutionary Classication Method for
Brain-Wave Analysis
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 500504, 2006.
c Springer-Verlag Berlin Heidelberg 2006
A Neural Evolutionary Classication Method 501
The initial population is seeded with random networks initialized with dierent
hidden layer sizes, using two exponential distributions to determine the number
of hidden layers and neurons for each individual, and a normal distribution to
determine the weights and bias values. For all weights matrices and bias we also
dene matrices of variance, that will be applied in conjunction with evolutionary
strategies in order to perturb network weights and bias. Variance matrices will be
initialized with matrices of all ones. In both cases, unlike other approaches like
[6], the maximum size and number of the hidden layers is neither pre-determined,
nor bounded, even though the tness function may penalize large networks. Each
individual is encoded in a structure in which we maintain basic information
about string codication of topology and weights and bias matrices. This kind
of structure is dened together with all parameters algorithms in [1]. The values
of all these parameters are aected by the genetic operators during evolution,
in order to perform incremental (adding hidden neurons or hidden layers) and
decremental (pruning hidden neurons or hidden layers) learning.
2.2 Fitness
Like indicated in [1] the tness is proportional to the value of the mse and to
the cost of the considered network. It is dened as
f = kc + (1 )mse, (1)
where Nhn is the number of hidden neurons, and Nsyn is the number of synapses.
The mse depends on the Activation Function, that calculates all the output
values for each single layer of the neural network. In this work we use the Sigmoid
Transfer Function. To be more precise, two tness values are actually calculated
for each individual: the tness f , used by the selection operator, and a test
tness f. f is calculated according to Equation 1 by using the mse over the test
set. When BP is used, i.e., if bp = 1, f = f; otherwise (bp = 0), f is calculated
according to Equation 1 by using the mse over the training and test sets together.
502 A. Azzini and A.G.B. Tettamanzi
All three topology mutation operators are designed so as to minimize their im-
pact on the behavior of the network; in other words, they are designed to be as
little disruptive (and as much neutral) as possible.
The dataset provided by Beverina and colleagues consists of 700 negative cases
and 295 positive cases. The feature are based on wavelets, morphological criteria
and power in dierent time windows, for a total of 78 real-valued input attributes
and 1 binary output attribute, indicating the class (positive or negative) of the
relevant case. In order to create a balanced dataset of the same cardinality as the
one used by Beverina and colleagues, for each run of the evolutionary algorithm
we extract 218 positive cases from the 295 positive cases of the original set, and
218 negative cases from the 700 negative cases of the original set, to create a 436
case training dataset; for each run, we also create a 40 case test set by randomly
extracting 20 positive cases and 20 negative cases from the remainder of the
original dataset, so that there is no overlap between the training and the test
sets. This is the same protocol followed by Beverina and colleagues. For each run
of the evolutionary algorithm we allow up to 25,000 network evaluations (i.e.,
simulations of the network on the whole training set), including those performed
by the backpropagation algorithm. 100 runs of the neuro-genetic approach with
parameters set to their defaults values were executed with bp = 0 and bp = 1,
i.e., both without and with backpropagation. The results obtained are presented
in Table 1.
Table 1. Error rates of the best solutions found by the neuro-genetic approach with
and without the use of backpropagation, averaged over 100 runs.
training test
bp false positives false negatives false positives false negatives
avg stdev avg stdev avg stdev avg stdev
0 93.28 38.668 86.14 38.289 7.62 3.9817 7.39 3.9026
1 29.42 14.329 36.47 12.716 1.96 1.4697 2.07 1.4924
504 A. Azzini and A.G.B. Tettamanzi
Due to the way the training set and the test set are used, it is not surprising
that error rates on the test sets look better than error rates on the training
sets. That happens because, in the case of bp = 1, the performance of a network
on the test set is used to calculate its tness, which is used by the evolutionary
algorithm to perform selection. Therefore, it is only networks whose performance
on the test set is better than average which are selected for reproduction. The
best solution has been found by the algorithm using backpropagation and is
a multi-layer perceptron with one hidden layer with 4 neurons, which gives 22
false positives and 29 false negatives on the training set, while it commits no
classication error on the test set. The results obtained by the neuro-genetic
approach, without any specic tuning of the parameters, appear promising. To
provide a reference, the average number of false positives obtained by Beverina
and colleagues with support vector machines are 9.62 on the training set and
3.26 on the test set, whereas the number of false negatives are 21.34 on the
training set and 4.45 on the test set [4].
References
1. A. Azzini, M. Lazzaroni, and G.B. Tettamanzi. A neuro-genetic approach to neural
network design. In F. Sartori, S. Manzoni, and M. Palmonari, editors, AI*IA 2005
Workshop on Evolutionary Computation. AI*IA, Italian Association for Articial
Intelligence, September 20 2005.
2. F. Beverina, G. Palmas, S.Silvoni, F. Piccione, and S. Giove. User adaptive bcis:
Ssvep and p300 based interfaces. PsychNology Journal, 1(4):331354, 2003.
3. E. Donchin, K.M. Spencer, and R. Wijesinghe. The mental prosthesis: assessing the
speed of a p300-based brain-computer interface. IEEE Transactions on Rehabilita-
tion Engineering, 8(2):174179, June 2000.
4. Giorgio Palmas. Personal communication, November 2005.
5. D. E. Rumelhart, J. L. McClelland, and R. J. Williams. Learning representations
by back-propagating errors. Nature, 323:533536, 1986.
6. X. Yao and Y.Liu. Towards designing articial neural networks by evolution. Applied
Mathematics and Computation, 91(1):8390, 1998.
Differential Evolution Applied to a Multimodal
Information Theoretic Optimization Problem
Abstract. This paper discusses the problems raised by the optimization of a mu-
tual information-based objective function, in the context of a multimodal speaker
detection. As no approximation is used, this function is highly nonlinear and
plagued by numerous local minima. Three different optimization methods are
compared. The Differential Evolution algorithm is deemed to be the best for the
problem at hand and, consequently, is used to perform the speaker detection.
1 Introduction
This paper addresses the optimization of an information theoretic-based function for
the detection of the current speaker in audio-video (AV) sequences. A single camera
and microphone are used so that the detection relies on the evaluation of the AV syn-
chronism. As in [1], the information contained in each modality is fused at the feature
level, optimizing the audio features with respect to the video ones. The objective func-
tion is based on mutual information (MI) and is highly nonlinear, with no analytical
formulation of its gradient, unless approximations are introduced. The local Powells
method [2], has been tried in a first set of experiments and the conclusion was that a
global approach was more suited. For this, two Evolutionary Algorithms (EAs) - the
Genetic Algorithm in Continuous Space [3] and the Differential Evolution [4] - have
been applied and their performance compared and analyzed.
After a brief introduction to the optimization problem, the three previously men-
tioned optimization methods are applied to the problem at hand. A detailed discussion
follows the experiments presented in the last part of the paper and suggests that DE is
the best choice for solving the given problem.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 505509, 2006.
c Springer-Verlag Berlin Heidelberg 2006
506 P. Besson et al.
can be used as such a classifer. This MI classifier must be provided with suitable fea-
tures to perform well. Let FA and FV be the features extracted from A and V , and let
e = I(FA ,FV )/H(FA ,FV ) [0, 1] be the features efficiency coefficient [1] (I and H
standing respectively for Shannons MI and entropy). Then the following inequality can
be stated [1]:
e(FA , FV ) + 1
Pe 1 . (1)
log 2
By minimizing the right hand term of inequality (1), we expect to recover in each
modality the information that originates from the common source S while discarding
the source-independent information present in each signal. This would lead to a more
accurate classification. For more details, see [1].
Application to speaker detection. The video features are the magnitude of the optical
flow estimated over T frames in the mouth regions (rectangular regions including the
lips and the chin). T 1 video feature vectors FV,t (t = 1, . . . ,T 1) are obtained, each
element of these vectors being an observation of the random variable (r.v.) FV .
The speech signal is represented as a set of T 1 vectors Ct , each containing P
Mel-Frequency Cepstral Coefficients (MFCCs), discarding the first coefficient.
Audio feature optimization. As mentioned, the goal is to construct better features for
classification. The focus is now on the audio features FA,t (), associated to the r.v. FA ,
P
that are built as the linear combination of the P MFCCs, FA,t () = i=1 (i)Ct (i),
t = 1,. . . ,T 1. The are to be optimized with respect to the Efficiency Coefficient
Criterion (ECC):
opt = arg max{e(FV , FA ())}. (2)
A ECC criterion is introduced to perform only one optimization for two mouths
and to take into account the discrepancy between the marginal densities of the video
features in each region. If FVM1 and FVM2 denote the r.v. associated to regions M1 and
M2 respectively, then the optimization problem becomes:
opt = arg max [e(FVM1 , FA ()) e(FVM2 , FA ())]2 . (3)
This MI-based optimization criterion requires the availability of the probability density
function (pdf) as well as of the marginal distributions of FA and FV . To avoid any
restrictive assumption, they are estimated using Parzen windowing.
3 Optimization Method
Definition of the optimization problem. The extraction of the optimized audio features
requires to find the vector RP , that minimizes -ECC (Eq. (3)). To restrain the
set of possible solutions, additional constraints have been introduced on the weights:
0 (i) 1 i = 1, 2, . . . , P , (4)
P
(i) = 1. (5)
i=1
Differential Evolution Applied 507
The optimization problem is highly nonlinear and gradient-free. Indeed, the MI-based
objective function is a priori non-convex and is very likely to present rugged surface.
Moreover, it is difficult to obtain an analytical form of its gradient due to the unknown
form of the pdf of the extracted audio features. The use of Parzen window to estimate
the pdf reduces the risk of getting trapped in a local minimum by smoothing the cost
function. Because a trade-off has to be found between smoothness and accuracy of
the distribution estimates, the smoothing parameter is iteratively adapted. Thus, the
optimization problem is solved using a multi-resolution scheme (see [5]).
The deterministic Powells method [2] has been used in a first set of experiments [6].
However, the objective function still exhibited too many local optima for a local opti-
mization method to perform well. A global optimization strategy fulfilling the following
requirements turned out to be preferable:
1. Efficiency for highly nonlinear problems without requiring the objective function
to be differentiable or even continuous over the search space;
2. Efficiency with cost functions that present a shallow, rough error surface;
3. Ability to deal with real-valued parameters;
4. Efficiency in handling the two constraints defined by Eqs. (4, 5);
Genetic Algorithm in Continuous Space (GACS). An evolutionary approach such as
GACS answers the first three requirements while presenting flexibility and simplicity
of use in a challenging context. The adaptation developed in [3] efficiently deals with
finite solution domain by relating the genetic operators to the constraints on the solu-
tion parameters. The crossover operator is defined such that the child chromosome is
guaranted to lie into the acceptance domain (defined by Eq. (4)) provided its parents
are valid. The mutation is performed by perturbing a randomly selected chromosome
element with a zero-mean Gaussian perturbation which variance is defined as a cer-
tain fraction of the acceptance domain. The mutation is rejected if the mutated gene
lies outside its acceptance domain. To satisfy the constraint defined by Eq. (5), the new
population is normalized. Notice that the initial chromosomes are regularly placed in
the acceptance domain according to a user-defined number of quantization levels Q [7].
This ensures a better initial exploration of the search space than a random initialization.
This extension of GACS leads to better results than the Powells method. However,
the mutation operator appears to be ineffective. The solutions are indeed very close to
the search space limits. A high number of mutations are then rejected, resulting in a
loss of the population diversity and in a premature convergence of the algorithm. The
perturbation should adapt to the population evolution and should lead to a better explo-
ration of the search space.
DE/rand/1/bin algorithm [8]. The initial population however is generated as done with
GACS. The validity of each perturbed vector is verified before starting the decision pro-
cess. If the element j of a child vector i does not belong to the acceptance domain, it is
replaced by the mean between its pre-mutation value and the bound that is violated [8].
This scheme is more efficient than the simple rejection adopted with GACS. Indeed, it
allows to asymptotically approach the search space bounds. To handle the second con-
straint (Eq. (5)), a simple normalization is performed on each child vector, as it was
done with GACS.
4 Results
Comparison of the optimization methods. All the test sequences are 4 seconds long,
PAL standard (T = 100); 12 MFCCs are computed using 23.22ms Hamming windows.
The three different optimization methods are tested on a single speaker sequence using
-ECC (Eq. (2)) as the objective function. is fixed to 10% of the acceptance domain for
GACS, while the scaling factor F and the crossover probability CR required by the DE
algorithm [8] are fixed to 0.5 and 1 respectively 1 . Both algorithms are run for 400 gen-
erations on a population of 125 vectors. 33 runs were then performed with GACS and
DE methods, whereas different initial solution guesses were tried for Powells method.
Table 1 summarizes the results. Obviously, a much better result is obtained using the
Table 1. Values of the -ECC cost function for 33 runs under the same conditions, on the same AV
sequence
global optimization schemes instead of the local one. DE is the algorithm that reaches
the best solution in a more stable way. Indeed, the standard deviation of the solutions
is much smaller in the case of DE than in the case of the other two methods, giving
us more confidence in the results. While the high variation of the solutions found with
Powells method is not a surprise (as it is very sensitive to initial conditions), the in-
stability of GACS solution seems intriguing. However, this is less surprising when we
analyze the evolution of the algorithm towards the solution: the degeneration of the
population combined with the less systematic exploration of the solution space (espe-
cially the boundaries) make GACS solutions to be very different from run to run. Both
the generation of the perturbation increment using the population itself instead of a pre-
defined probability density and the handling of the out-of-range values allow the DE
algorithm to achieve outstanding performance in the context of our problem.
Audiovisual speaker detection results. Five home-grown sequences with two individ-
uals (only one being speaking at a time) are now used. The DE optimization method is
1
The implementation of the DE algorithm is based on Storns public domain software [9].
Differential Evolution Applied 509
used to project the MFCCs on a new 1D subspace as defined in Sec. 2 using ECC
as optimization criterion. The measure of the MI between the resulting audio feature
vector FAopt and the video features of each mouth region allows to classify the mouth as
speaking (highest value of MI) or non-speaking (lowest value of MI). The normal-
ized difference of MI is always in favor of the active speaker, i.e. the correct speaking
mouth region is always indicated (see Table 2).
Table 2. Normalized difference between the speaking and the non-speaking mouth regions MI
using the audio features optimized with the ECC cost function
Sequence 1 2 3 4 5
I 84.23% 86.27% 95.55% 80.9% 76.15%
5 Conclusions
One central issue in the context of the multimodal speaker detection method described
here is the optimization of an objective function based on MI. Since no approximation
is made, neither of the pdf of the features (estimated from the samples), nor of the
cost function, the optimization problem turns out to be a quite challenging one. The
performances and limits of three optimization methods, the local Powells method and
the global GACS and DE, have been compared, showing that the intrinsic properties of
the DE algorithm make it the best choice for the problem tackled here. As a result, the
method is able to detect the current speaker on the five test sequences.
References
1. Butz, T., Thiran, J.P.: From error probability to information theoretic (multi-modal) signal
processing. Signal Process. 85 (2005) 875902
2. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. 2nd
edn. Cambridge University Press (1992)
3. Schroeter, P., Vesin, J.M., Langenberger, T., Meuli, R.: Robust parameter estimation of in-
tensity distributions for brain magnetic resonance images. IEEE Trans. Med. Imaging 17(2)
(1998) 172186
4. Storn, R., Price, K.: Differential evolution - a simple and efficient adaptive scheme for global
optimization over continuous spaces. J. Global Optim. 11 (1997) 341359
5. Besson, P., Popovici, V., Vesin, J., Kunt, M.: Extraction of audio features specific to speech
using information theory and differential evolution. EPFL-ITS Tech. Rep. 2005-018, EPFL,
Lausanne, Switzerland (2005)
6. Besson, P., Kunt, M., Butz, T., Thiran, J.P.: A multimodal approach to extract optimized audio
features for speaker detection. In: Proc. EUSIPCO. (2005)
7. Leung, Y.W., Wang, Y.: An orthogonal genetic algorithm with quantization for global numer-
ical optimization. IEEE Trans. Evo. Comp. 5(1) (2001) 4153
8. Price, K.V.: 6: An Introduction to Differential Evolution. In: New Ideas in Optimization.
McGraw-Hill (1999) 79108
9. Storn, R.: Differential evolution homepage [online]. (Available: http://www.icsi.berkeley.edu/
storn/code.html)
Artificial Life Models in Lung CTs
Abstract. With the present paper we introduce a new Computer Assisted De-
tection method for Lung Cancer in CT images. The algorithm is based on sev-
eral sub-modules: 3D Region Growing, Active Contour And Shape Models, Cen-
tre of Maximal Balls, but the core of our approach are Biological Models of ants
known as Artificial Life models. In the first step of the algorithm images undergo
a 3D region growing procedure for identifying the ribs cage; then Active Contour
Models are used in order to build a confined area for the incoming ants that are
deployed to make clean and accurate reconstruction of the bronchial and vascular
tree, which is removed from the image just before checking for nodules.
1 Introduction
The use of Chest Computer Tomography (CT) has lead to a better identification of
lung cancer as well as the definition of its type, also reducing the number of benign
nodules that are removed. The output of such an exam is a huge amount of data
(series of 2D images) that in the end the radiologist must investigate and look upon.
With the present paper we discuss a new CAD system that makes use of Artificial
Life models and other supporting techniques. A flowchart would look as follows:
region growing is used to reconstruct the rib cage, Active Contour models are
bounding (?) the ribs cage. Ants are released in this newly created confined volume to
reconstruct the bronchial and the vascular tree, and finally atfer the trees removal the
hunt for nodules begins. In the next section some research work is introduced. The
following section describes the Ant System and the paper closes with some
conclusions and the discussion of future work.
2 Artificial Life
Artificial Life is the study of man-made systems which exhibit behaviors characteris-
tic of natural living systems. In nature ants use stigmergic communication (by indirect
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 510 514, 2006.
Springer-Verlag Berlin Heidelberg 2006
Artificial Life Models in Lung CTs 511
Among the pioneers of the field was Dorigo, that treated the Ant Colony Optimization
problem [2] and showed how a group of ants can successfully find a close-to-optimal
solution for the Traveling Salesman problem [3], Graph Coloring, Quadratic Assign-
ment Problem [4] or the Job Shop scheduling.
Another pioneer of Collective was James Kennedy that, in 1995, proposed Particle
Swarm Optimization (PSO) [5]. More related to the approach we pursue is the work
by Chialvo and Millonas [6]. Based on the this paper, Ramos and Almeida [7] devel-
oped an extended model where ants are deployed in a digital habitat (image), such that
the insects are able to move and to perceive it. In [8] Mazouzi and Batouche introduce
a MAS (Multi-Agent System) for 3D object recognition. Other work can also be found
in [9] where a cognitive MAS for the surveillance of dynamic scenes is discussed.
2.2 Materials
The Input of the algorithm are lung CT images, composed of a series of 2D files in
DICOM format. The 2D image size is generally 512x512 pixels with a 16bits depth.
The series is reconstructed as a 3D Matrix with a voxel size in the 3rd dimension
which depends on the number of slices contained. The database for all the tests is
provided by the MAGIC-5 Collaboration (add ref.). The images are being taken in the
framework of the Regione Toscana (reference?) lung cancer screening program.
3 Ants
Ants are to be created and released in a confined 3-D world. Different algorithms
could be developed. In the present paper we discuss the Wander-approach in which
ants are randomly released in the habitat. While they wander about according to some
well defined rules, they leave behind different quantities of pheromone. In the end a
map of pheromone will be created that will represent the image that we are interested
in: the reconstruction of the bronchial and vascular tree. Two kinds of ant individuals
are present. The queen creates and manages all the ants, it does not move and does
not perceive the habitat. The reconstructor is aware of the habitat and lives in it.
A third type, the shaper, that will try to recognize the nodules will soon be im-
plemented.
The approach is based on an idea introduced by Ramos and Almeida in [7]. Lets
suppose that at time t an ant is in voxel k. At time t +1 the ant is supposed to choose
th
as next destination one of the 26 neighbours of k. This is done as follows: for each
neighbour i the following probability is calculated:
512 S.C. Cheran and G. Gargano
W ( i )w( i )
Pik =
W ( j )w( j )
j /k
(1)
where
W ( ) = 1+ (2)
1+
is the function that depends on the pheromone density, being the osmotropo-
taxic sensitivity and the sensory capacity, while w( i ) is the probabilistic direc-
tional bias. The directional bias is a probability that each voxel receives when taking
into account the initial direction of the ant (Fig. 1).
Fig. 1. One of the possible cases of directional bias. The lightest the colour of the voxel the
higher the probability associated to that voxel.
As its next destination the ant will chose the voxel with the highest Pik , and leav-
ing a voxel k it will leave behind a certain amount of pheromone T:
T = + ph (3)
where is the preset amount of pheromone, p is a constant and h is the function
that relates the amount of pheromone with the information provided by the image.
The deposited pheromone will evaporate with a certain rate and it will not diffuse
in neighbouring voxels. For this paper (if not specified otherwise) h is:
h = Ii I k (4)
We used the gradient rule as the difference in intensities between two voxels situated
on the different sides of the border of a branch will imply a big pheromone quantity
being left behind by the visiting ants.
In this case the ant that is moving from voxel k to voxel i leaves behind an amount
of pheromone equal to the gradient computed between the two voxels.
As a 2D image is a 3D image with just one slice the first tests were performed on
2D images. We started with different structures that we considered to be difficult, like
Artificial Life Models in Lung CTs 513
Fig. 2. First Image: Original structures extracted from a real CT. Second: the map of phero-
mones after 50 cycles. Third: the positions of the ants after 50 cycles. Fourth: Original CT,
Fifth I: Pheromone Map for the CT slice.
branching points, round structures, intersection of branches, lung fissures, etc. In the
examples from Fig. 2, h was as described in (4).
The following step was to launch a colony of ants in full 2D CT image. As the im-
age is 512x512 pixels we created a colony of 80.000 ants and started the algorithm.
The output is represented in Fig. 3.
At this point we began 3D trials, by choosing different h and in some cases the
rules that we used made ants non-discriminative and the resulted reconstruction can
contain all the lung one can see in Figure 3 (In this case the h = k * Ii ).
Going to smaller structures, we tried regular artificial objects that the ants, using
the gradient rule as h , were able to fully reconstruct. This can be seen in Fig. 3.
Fig. 3. Left: Zelous ants (reconstructing to much). Rest of the images: artificial branching point
after 1 and 50 cycles, artificial sphere after 1 and 50 cycles.
Fig. 4. Ants at work - depositing pheromone in the first and the second images. Third: . Recon-
struction of 3D lung branch. Left: after 1 cycle. Forth : after 50 cycles, Fifth : after 100 cycles.
In Figure 4 (first 2 images) one can see the results of intermediate steps, when try-
ing to reconstruct a small part of the vascular tree. After some testing one obtains the
last 3 images in Fig. 4, which represent a part of the bronchial tree in the right lung.
As one can see the reconstruction is not clean enough and further testing still needs to
be done.
514 S.C. Cheran and G. Gargano
4 Conclusions
At the core of our algorithm stand Artificial life models (virtual ants or better, virtual
termites if taken into account the existence of different types if individuals). One of
our goals is to study and understand the collective behaviour of the ants in 3-D envi-
ronments worlds by making them capable of acknowledging the environment and of
extracting useful information from it. We expect ants to behave differently when
gaining a new grade of freedom which will lead to the revealing of new emergent
behaviours of the colony based on ways the nature governs and controls our world.
Another important aspect of the work is the creation of the CAD for lung CT. Once
finished the reconstruction of the trees will be compared with already existing algo-
rithms from image processing like 3-D region growing and the best performer will be
included in the CAD (or one could chose a merging solution). We still have to test the
algorithm on full CT images. This work will be done once the algorithm has been
fully tested, with very good results, on smaller images.
References
1. Grasse P: Termitologia, Tome II Fondation des Socits. Construction. Paris; Masson,
1984.
2. Colorni A, Dorigo M, Maniezzo V : Distributed Optimization by ant colonies Proc.
ECAL91, Paris, France, pp 134-142 Elsevier Publishing 1991.
3. Colorni A, Dorigo M, Maniezzo V: The ant system: optimization by a colony of cooperat
ing agents IEEE Trans. Syst., Man. & Cybern. PartB, vol 26, no.1, pp 1-13, 1996.
4. Colorni A, Dorigo M, Maniezzo V : The Ant system applied to the quadratic assignment
problem, Technical Report 94/28 of IRIDIA, Univeriste Libre de Bruxelles, 1994.
5. Eberhart R C, Kennedy J : A New Optimizer Using Particles Swarm Theory, Proc. Sixth
International Symposium on Micro Machine and Human Science (Nagoya, Japan), IEEE
Service Center, Piscataway, NJ, 39-43, 1995
6. Chialvo D, Millonas M: How Swarms Build Cognitive Maps. In Luc Steels (Ed.), The
Biology and Technology of Intelligent Autonomous Agents, (144) pp. 439-450, NATO ASI
Series, 1995.
7. Ramos V, Almeida F: Artificial Ant Colonies in Digital Image Habitats -A Mass Behav
iour Effect Study on Pattern Recognition, Proceedings of ANTS2000 -2nd International
Workshop on Ant Algorithms (From Ant Colonies to Artificial Ants),
8. Mazouzi S, Batouche M: A self-adaptive multi-agent system for 3D object recognition
9. Remagnino P, Tan T, Baker K : Multi-agent visual surveillance of dynamic scenes, Image
and vision computing, Elsevier Ed., vol. 16, pp 529-532, 1998.
Learning High-Level Visual Concepts Using Attributed
Primitives and Genetic Programming
Krzysztof Krawiec
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 515 519, 2006.
Springer-Verlag Berlin Heidelberg 2006
516 K. Krawiec
type P and produce an output of type P. Operators that implement basic set algebra,
like set union, intersection, or difference, belong to this category. Parametric
selectors expect three child nodes of types P, A, and , and produce an output of type
P. For instance, the operator LessThan applied to child nodes (P, Orientation, 70)
filters out all VPs from P for which the value of the attribute Orientation is less than
70. Non-parametric aggregators expect two child nodes of types P and A, and
produce an output of type . Operators of this type implement mostly statistical
descriptors, e.g., the operator Mean applied to child nodes (P, Coordinate_X)
computes the mean value of coordinate x for all VPs in P. Parametric aggregators
expect three child nodes of types P, A, and , and produce an output of type . For
instance, the operator CentralMoment applied to child nodes (P, Coordinate_Y, 2)
computes the central moment of order 2 of coordinate y for all VPs in P. Table 1
presents the complete list of GP operators used in the computational experiments.
3 Experimental Results
For experimental verification, we chose the task of locating computer screens in 38
images taken from the aug1_static_atb_office_bldg400 folder of the MIT-CSAIL
Database of Objects and Scenes [14]. The images exhibit significant variability of
brightness and proximity of monitor case, the state of the screen (on/off), lighting
conditions, and scene contents (see example in Fig. 1). To avoid bias toward the center of
the scene where the screens are most often to be found, we created extra examples by
cropping each training example from all sides, what lead to (1+4)38 = 190 examples.
The original images have been converted to grayscale and downscaled to 12896 pixels.
Next, basing on the preprocessed image, we create its VPR using the procedure described
in Section 3. For the resulting training set of images I, the number of VPs varied from
103 to 145 per image (122.2 on the average).
The experiment consisted in evolving a population of 1000 individuals for 50 gen-
erations, using a minimized fitness function f defined as follows:
f ( s) = d ( p) / | I | , (1)
zI ps ( z )
where s(z) denotes the set of visual primitives produced by an individual s given
image z as an input, and d(p) denotes the Euclidean distance between the VP p and the
actual screen position. Most of GP-related parameters are set to their defaults used in
ECJ library [6]. This includes: algorithm type (generational), mutation probability
518 K. Krawiec
Fig. 2. An input image (left) and the result produced by the best evolved individual (right)
(SelectorGreaterThan Coordinate_Y
(SelectorLessThan (+
(SelectorGreaterThan (-
(SelectorLessThan (% (% 0.9664 -0.5619) 0.8692)
(SelectorGreaterThan -0.8356722)
VPR (Mean
Coordinate_Y (SelectorGreaterThan
(exp 0.96645296)) VPR
Orientation Intensity
(exp 0.96645296)) (exp 0.96645296))
Coordinate_X Coordinate_Y)))
(cos Intensity
(Median (exp
(SelectorGreaterThan (-
VPR (-
Intensity (cos
0.18945523) (Median VPR Orientation))
Orientation))) -0.8356722)
(continued in right column) -0.8356722)))
Fig. 3. The textual representation of the best individual found during the evolutionary run
(0.1), mutation type (one-point), crossover probability 0.9, maximum tree depth
allowed for individuals modified by genetic operators: 7, number of retries if the
resulting individual does not meet this limit: 3. The procedure selecting the tree nodes
for mutation or crossover selects internal tree nodes with probability 0.9, and tree
leaves with probability 0.1. The software used ECJ [6]) and JAI [12] libraries.
Figure 2 shows the result of recognition performed by the best individual found
during the evolutionary run. In the VP image part, the closed polygon shows the
actual location of the recognized object (computer screen). The short thick segments
depict the VPs selected by the individual, whereas the thin segments correspond to
those VPs which were originally present in the VPR of the input image, but have been
filtered out by the individual. In Fig. 3, we present the code of the best GP individual
evolved in the evolutionary run.
4 Conclusions
The major feature of the proposed approach is the abstraction from raw raster images,
which enables the learning process to develop advanced visual concepts that may
Learning High-Level Visual Concepts Using Attributed Primitives and GP 519
Acknowledgments
This research has been supported by KBN research grant 3 T11C 050 26.
References
1. Bhanu, B., Lin, Y., Krawiec, K.: Evolutionary Synthesis of Pattern Recognition Systems.
Springer-Verlag, Berlin Heidelberg New York (2005)
2. Gabor, D.: Theory of Communication, J. Inst. of Electrical Engineers 93 (1946) 429457
3. Johnson, M.P., Maes, P., Darrell, T.: Evolving visual routines. In: Brooks, R.A., Maes, P.
(eds.): Artificial Life IV: proceedings of the fourth international workshop on the synthesis
and simulation of living systems. MIT Press, Cambridge, MA (1994) 373390
4. Koza, J.R., Andre, D., Bennett III, F.H., Keane, M.A.: Genetic Programming III: Darwin-
ian Invention and Problem Solving. Morgan Kaufman, San Francisco, CA (1999)
5. Krawiec, K., Bhanu, B.: Visual Learning by Coevolutionary Feature Synthesis. IEEE
Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 35 (2005) 409425
6. Luke, S.: ECJ Evolutionary Computation System. http://www.cs.umd.edu/projects/
plus/ec/ecj/ (2002)
7. Maloof, M.A., Langley, P., Binford, T.O., Nevatia, R., Sage, S.: Improved rooftop detec-
tion in aerial images with machine learning. Machine Learning 53 (2003) 157191
8. Marek, A., Smart, W.D., and Martin, M.C.: Learning Visual Feature Detectors for Obsta-
cle Avoidance using Genetic Programming. In: Late Breaking Papers at the Genetic and
Evolutionary Computation Conference (GECCO-2002), New York (2002) 330336
9. Marr, D.: Vision. W.H. Freeman, San Francisco, CA (1982)
10. Rizki, M., Zmuda, M., Tamburino, L.: Evolving pattern recognition systems. IEEE Trans-
actions on Evolutionary Computation 6 (2002) 594609
11. Song, A., Ciesielski, V.: Fast texture segmentation using genetic programming. In: Sarker,
R., et al. (eds.): Proceedings of the 2003 Congress on Evolutionary Computation
CEC2003. IEEE Press, Canberra (2003) 21262133
12. Sun Microsystems Inc.: Java Advanced Imaging API Specification. Version 1.2 (2001)
13. Teller, A., Veloso, M.M.: PADO: A new learning architecture for object recognition. In:
Ikeuchi, K., Veloso, M. (eds.): Symbolic Visual Learning. Oxford Press, New York (1997)
77112
14. Torralba, A., Murphy, K.M., Freeman, W.T.: MIT-CSAIL Computer vision annotated im-
age library. http://web.mit.edu/torralba/www/database.html (2004)
Evolutionary Denoising Based on an Estimation
of Holder Exponents with Oscillations
1 Introduction
In the past years many dierent signal and image denoising techniques have
been proposed, some of them being even based on articial evolution [1, 2]. The
basic notations are the following. One observes a signal or an image Y which is
some combination F (X, B) of the signal of interest X and a noise B. Making
various assumptions on the noise, the structure of X and the function F , one
then tries to obtain an estimate X of the original signal, optimal in some sense.
We consider denoising as equivalent to increasing the Holder function Y (see
section 2 for denitions) of the observations. Indeed, it is generally true that the
local regularity of the noisy observations is smaller than the one of the original
image, so that in any case, X should be greater than Y .
In this paper, section 2 recalls some basic facts about Holder regularity anal-
ysis. We describe in section 3 how oscillations are used to provide an estimator
of the Holderian regularity. The new denoising method is explained in section 4
and the evolutionary algorithm, with its ad-hoc genetic operators, are detailed
in section 5. Numerical experiments are presented in section 6.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 520524, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Evolutionary Denoising Based on an Estimation of Holder Exponents 521
2 Holder Regularity
To simplify notations, we deal with 1D signals, and we assume that signals are
nowhere dierentiable. Generalisation to dierentiable signals simply requires to
introduce polynomials in the denitions [3]. Below the denitions of the pointwise
and local Holder exponents are given.
Let (0, 1), and x0 K R. A function f : K R is in Cx0 if for all x in
a neighbourhood of x0 , |f (x)f (x0 )| c|xx0 | (2) where c is a constant. The
pointwise Holder exponent of f at x0 , denoted p (f, x0 ), is the supremum
of the for which (2) holds.
Let us now introduce the local Holder exponent: Let (0, 1), R. One
says that f Cl ( ) if C : x, y : |f (x)f
|xy|
(y)|
C (3). Let l (f, x0 , ) =
sup{ : f Cl (B(x0, ))}. The local Holder exponent of f at x0 is l (f, x0 ) =
lim l (f, x0 , ).
0
Since p and l are dened at each point, we may associate to f two functions
x p (f, x) and x l (f, x) which are two dierent ways of measuring the
evolution of its regularity.
The quality of a denoising technique based on these exponents, strongly relies
on the quality of an estimator of these quantities. In [1], the estimation was
performed by a wavelet technique. We will see in the sequel that a better esti-
mation of the Holder exponent can be obtained by measuring the oscillations of
the function.
3 Estimation by Oscillations
The estimation based on oscillations measurements is a direct application of the
local Holder exponent denition (see [4]). The condition (3) can be written as: A
function f (t) is Holderian with exponent [0, 1] at t if there exists a constant
c, for all : osc (t) c with
osc (t) = sup f (t ) inf f (t ) = sup |f (t ) f (t )|.
|tt | |tt | t ,t [t,t+ ]
At each point we estimate the pointwise Holder exponent as the slope of the
regression of the logarithm of the oscillation versus the size of the window . As
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0.5 0.5
0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 2500 3000 3500 4000
W1 OSC
Fig. 1. 10 multifractional Brownian Motions have been built with a regularity H
evolving like a sine. The 2 methods of estimation of the Holderian regularity have
been applied: a wavelet-based (W1) and the method by oscillations (OSC). After an
optimisation of the parameters of the 2 methods in term of risk, the means of the
estimated Holder functions are displayed.
522 P. Legrand, E. Lutton, and G. Olague
4 Method
5 Evolutionary Algorithm
resulting segment is then used to select the best parts of the two individuals as
the corresponding segment of the child.
Mutation: In a similar way, each segment (or image window) is muted using a
probability law inversely proportional to the local tness. For each individual
we consider the worst local tness wlf i.e. the tness of the worst segment. Let
lf (j, i) the local tness of the ith segment of the j th individual. The probability
of mutation for this segment is P m(j, i) = lfwlf(j,i)
.
6 Numerical Results
For the rst example (see gure 2), the original signal is a Generalized Weier-
strass function with a regularity (X, t) = t with t [0, 1]. This signal is cor-
rupted by a white Gaussian noise (standard deviation equal to 0.3). We use a
synthetic image to perform an experiment in 2 dimensions. The gure 3 shows
4 4 4 4 4
3 3 3 3 3
2 2 2 2 2
1 1 1 1 1
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
0 0 0 0 0
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
Fig. 2. First row: original Generalized Weierstrass Function, noisy version, denoising
with Soft Thresholding, denoising by our method after 10 generations and 50 indi-
viduals, denoising by our method after 500 generations and 200 individuals. Second
row: corresponding Holder functions. Our method allow to recover almost perfectly
the Holder function of the original signal.
Fig. 3. Original image, the noisy one, a denoising by Soft Thresholding and by our
method (100 ind, 500 gen). The second row displays the corresponding Holder functions.
524 P. Legrand, E. Lutton, and G. Olague
the original image, the noisy one, a denoising by Soft Thresholding and by our
method. The second row displays the corresponding Holder functions. As in the
previous examples, our method allows to obtain a denoised version of the signal
with the prescribed regularity.
7 Conclusion
We have experimented in this paper a new scheme for a multifractal denois-
ing technique. It is based on a more precise and more complexcomputation of
the Holder exponent of a signal. This work is actually a rst attempt to use
an estimation of Holder exponent based on oscillations for signal enhancement.
Preliminary experiments yield satisfactory results, with a more precise control
of the reconstructed regularity, which has to be considered as a major advan-
tage for this type of techniques. Moreover, the evolutionary engine that has been
designed has the following interesting characteristics: it performs a basic hybridi-
sation with two other denoising techniques (Multifractal Bayesian Denoising and
Multifractal Pumping for the initialisation step), and uses locally optimised ge-
netic operators. Further work will rst consist in a more precise analysis of the
locally optimised genetic operators in comparison with classical blind ones.
Second, the hybridisation scheme has to be investigated as it may be a good
solution to reduce computation costs of the method. Additionally, the availabil-
ity of a pointwise and local denition of the tness opens the way to Parisian
evolution implementations for the genetic engine. This may be another solution
to reduce computational expenses of the method, see for example [7, 8].
References
1. J. Levy Vehel and E. Lutton, Evolutionary signal enhancement based on Holder
regularity analysis, EVOIASP2001, LNCS 2038, 2001.
2. Pierre Grenier, Jacques Levy Vehel, and Evelyne Lutton, An interactive ea for
mulifractal bayesian denoising, EvoIASP 2005, 30 March - 1 April, Lausanne,
2005.
3. Y. Meyer, Wavelets, Vibrations and Scalings, American Mathematical Society,
CRM Monograph Series, vol. 9, 1997.
4. C. Tricot, Curves and Fractal Dimension, Springer-Verlag, 1995.
5. P. Legrand, Debruitage et interpolation par analyse de la regularite holderienne.
application a la modelisation du frottement pneumatique-chaussee, phd thesis, Uni-
versite de Nantes et Ecole Centrale de Nantes, 2004.
6. J. Levy Vehel and P. Legrand, Bayesian multifractal denoising, ICASSP 2003,
IEEE internat. conf. on Acoustics, Speech, and Signal Processing, Hong Kong, April
6-10 2003.
7. Frederic Raynal, Pierre Collet, Evelyne Lutton, and Marc Schoenauer, Polar ifs
+ parisian genetic programming = ecient ifs inverse problem solving, Genetic
Programming and Evolvable Machines Journal, Volume 1, Issue 4, pp. 339-361,
October, 2000.
8. Enrique Dunn, Gustavo Olague, and Evelyne Lutton, Automated photogrammetric
network design using the parisian approach, EvoIASP 2005, 30 March - 1 April,
Lausanne, 2005.
Probability Evolutionary Algorithm Based
Human Body Tracking
1 Introduction
With the fast developments of computer science and technology, visual analysis of
human motion in image sequences interests more and more researchers from both
laboratory and industry. Human tracking is a particularly important issue in human
motion analysis and it has been a popular topic in the research of computer vision.
Tracking can be divided into region-based, feature-based, active-counter-based and
model-based tracking [1]. Model-based tracking can provide abundant informa tion of
human motion, but the increasing of subparts of the human model would potentially
incur high dimensionality and make tracking a difficult task. Different from using
particle filters within the Bayesian framework [2-6], human tracking is considered to
be a function optimization problem in this paper, so the aim is to optimize the
matching function between the model and the observation. Function optimization is a
typical application area of Genetic Algorithms (GAs), but canonical genetic
algorithms is hard to be used here due to the high dimensionality of human model and
the requirement of computation speed. In this paper, we present a novel evolutionary
algorithm called Probability Evolutionary Algorithm (PEA) which is inspired by the
Quantum computation [7] and Quantum-inspired Evolutionary Algorithm (QEA) [8],
and then the PEA based human body tracking is proposed in which PEA is used to
optimize the matching function. PEA has a good balance between exploration and
exploitation with very fast computation speed, and it is suitable for human tracking
and other real-time optimization problems.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 525 529, 2006.
Springer-Verlag Berlin Heidelberg 2006
526 S. Shen and W. Chen
2 PEA
Different from tracking human using particle filters within the Bayesian framework,
tracking is considered to be a function optimization problem in this paper. We denote
the human model by X, and denote the observation associate with X by Z. The
function f (X,Z) represents the matching degree between X and Z. Assume that we
have known that the model at time instance t-1 is X t-1, so the model X t at time
instance t can be get by equation 2.
X t =X t-1+X (2)
t-1 t
Here, X is the change of the model X . After we get X , the matching function
f (X t,Z t) can be calculated. Since X t is associated with X, the matching function can
be written as:
f (X t,Z t) = g (X) (3)
So tracking at time instance t is to optimize g (X) in Xs search space.
Generally, g (X) is a multi-modal function with many local best solutions, and
conventional optimization methods are difficult to get the global best solution, so we
use PEA to optimize g (X).
Probability Evolutionary Algorithm Based Human Body Tracking 527
We employ a 10-part articulated human body model which consists of 10 parts and
each pairs of neighbor parts are connected by the joint point, as shown in Fig. 1.
The model has 10 joints, and the root joint is at the middle bottom of the trunk. The
root joint has 3 degrees, and each of the other 9 joints has 1 degree. The model X can
be written as:
X = {x, y, 1, 2, 10} (4)
Here, x and y represent the location of the root joint, and 1,210 represent the
swiveling angles of the 10 joints. X can be written as:
X={x, y, 1, 10} (5)
Human motion is a gradually changed movement, so X can be limited in a logical
small scope. This scope can be learned or man-made.
4 Experimental Results
Two image sequences are used here to demonstrate the effectiveness of PEA.
Sequence 1 is a synthetic image sequence generated by Pose software [10] which
consists of 100 frames. Sequences 2 is a real image sequence which consists of 325
frames. The observation Z is also an important factor in tracking. Here we use two
types of visual cues: edge and intensity. We compared the tracking results from PEA
with Annealed Particle Filtering (APF). All the algorithms run on a 2.4GHz PC
without code optimization.
4.2 Results
Some tracking results of PEA and APF for sequence 1 and sequence 2 are shown in
Fig. 2 and Fig. 3 respectively. The average computation time for one frame of PEA
528 S. Shen and W. Chen
Fig. 2. Some tracking results of sequence 1. The top row is the tracking results based on APF.
The bottom row is the tracking results based on PEA.
Fig. 3. Some tracking results of sequence 2. The top row is the tracking results based on APF.
The bottom row is the tracking results based on PEA.
and APF are shown in Table 1. The results show that the PEA based tracking
algorithm yields more stable results than APF, and run much faster than APF.
In the experiments we also found that, when the population size is bigger than 10,
the tracking result can not be improved further, so we suggest that the population size
is set to 2 to 8 in applications in order to get a balance between the tracking accuracy
and the computation time.
5 Conclusions
Model-based human tracking is a challenging problem, since the human model has
high dimensionality. Different from tracking human using particle filters, we consider
Probability Evolutionary Algorithm Based Human Body Tracking 529
References
1. Hu, W.M., Tan, T.N., Wang, L., Maybank, S.J.: A survey on visual surveillance of object
motion and behaviors. IEEE Trans. on System Man and Cybernetics 34 (2004) 334351
2. Gavrila, D., Davis, L.: 3D model based tracking of humans inaction: A multiview
approach. In: IEEE Proceedings of International Conference on Computer Vision and
Pattern Recognition, San Francisco, California (1996) 7380.
3. Isard, M., Blake, A.: CONDENSATION-conditional density propagation for visual
tracking. International Journal of Computer Vision 29 (1998) 528
4. Deutscher, J., Davidson, A., Reid, I.: Articulated partitioning of high dimensional search
spaces associated with articulated body motion capture. In: IEEE Proceedings of
International Conference on Computer Vision and Pattern Recognition, Hawaii (2001)
669676
5. Wu, Y., Hua, G., Yu, T.: Tracking Articulated Body by Dynamic Markov Network. In:
Proceedings of the Ninth IEEE International Conference on Computer Vision (2003)
10961101
6. Zhao, T., Nevatia, R.: Tracking Multiple Humans in Crowded Environment. In: IEEE
Proceedings of International Conference on Computer Vision and Pattern Recognition
(2004) 342349
7. Han, K.H., Kim, J.H.: Quantum-Inspired Evolutionary Algorithm for a Class of
Combinatorial Optimization. IEEE Trans. on Evolutionary Computing 6 (2002) 580593
8. Hey, T.: Quantum computing: An introduction. Computing & Control Engineering Journal
10 (1996) 105121
9. Shen, S.H., Jiang, W.K., Chen, W.R.: Research of Probability Evolutionary Algorithm. In:
8th International Conference for Young Computer Scientists, Beijing (2005) 93-97
10. Poser Software: Available from http://www.curiouslabs.com
On Interactive Evolution Strategies
1 Introduction
The research eld of human-algorithm interaction (HAI) puts forward the in-
volvement of humans in algorithmic solution processes. In contrast to human
computer interaction the focus of this technology is on computational processes
that are assisted by users. In contrast to interactive software like text processing
systems or drawing software, the main structure of the solution process for the
higher level task is still governed by the algorithm. The user has to assist the
algorithm at some stages, that call for decisions based on subjective preferences,
or that require the insights of experts in a problem, the formalization of which
is often very dicult.
On a very global level we propose to distinguish between reactive or proactive
interaction, i.e. user feedback requested by the algorithm, or optional interven-
tions by the users into an autonomously running algorithm. An example for
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 530541, 2006.
c Springer-Verlag Berlin Heidelberg 2006
On Interactive Evolution Strategies 531
2 Evolution Strategies
Next we will describe the (, , )-ES, a modern instantiation of the ES. The
main loop of the (, , )-ES reads as follows: The algorithm starts with the
initialization of a population (multi-set) P0 of individuals (objective function
vectors + mutation parameters). The initialization can be done uniformly dis-
tributed within the parameter space. P0 forms a starting populations, and within
subsequent iterations a series of populations (Pt )i=1,2,... is generated by means
of a stochastic procedure: In a rst step of this procedure random variations of
individuals in Pt are generated by means of a variation operator, the details of
which we will describe later. The new variants form the population Qt (ospring
population). Then, among all individuals in Pt and Qt the best individuals
that have not exceeded a maximal age of generations are selected by means of
a selection criterion. In case of = 1 the strategy is termed (, )-ES, while in
case of = we denote it with ( + )-ES.
The variation-selection process is meant to drive the populations into regions
of better solutions as t increases. However, there is no criterion that can be
used to determine whether the best region was found (except in cases with a
pre-dened goal or bound on the objective space). Hence the process is usually
terminated if the user decides to stop it, e.g. because of his/her time constraints
or because of a long time stagnation of the best found value.
Next, let us describe the variation operator that are used to generate o-
springs. Individuals within the ES (if applied for continuous optimization) con-
sist of a vector of decision variables x = (x1 , . . . , xnx ) R and a step-size vector
s = (s1 , . . . , sns ) R+ that is used to scale the mutation distribution of an
individual.
A mutation algorithm with a single step-size is described in algorithm 1. First
the step-size of the parent individual is multiplied by a constant factor, the value
of which can be 1, or 1/ depending on a random number. Then this step-
size of the new individual is used to obtain the decision variables of the new
individual. These are obtained by adding an oset to the corresponding value of
the original individual. The value of this oset is determined by a standard nor-
mal distributed random number. The idea behind this mutation operator is that
decision variable vectors that are generated with a favorable step-size are more
likely to be part of the next generation, and thus also the information about the
step size that was used to generate them is transferred to that generation. The
process of mutative step-size adaptation was investigated in detail by Beyer et
al. [5]. Due to his ndings, simple adaptation rules like the 2-point or 3-point
mutation for the step sizes serve well, whenever only a few iterations of the algo-
On Interactive Evolution Strategies 533
rithm can be aorded. For a higher number of iterations, say tmax 100, more
sophisticated adaptation mechanisms should be considered for the parameters of
the mutation. State of the art techniques include the individual step size adap-
tation by Schwefel [16] and the covariance matrix adaptation (CMA) by Hansen
and Ostermeier [8]. Note, that in order to allow for a mutative step-size adapta-
tion, a surplus of ospring individuals needs to be generated in each generation.
The recommended ratio of / 1/7 leads to a good average performance of
the ES in many cases [16].
possibilities. Accordingly, the decision eort for the latter would be the lowest
(assuming < Ng ). For = 1 it even reduces to alternatives. Since, the users
time is usually limited in the following, we will focus on a strategy that mini-
mizes the users eort, namely alternative 3 and = 1. Before discussing some
experimental results with the latter strategy, let us rst explore some further
possibilities to conduct theoretical research on the behavior of interactive ES.
Here X is for instance the space of all sequences of numbers between 1 and ,
in case of a simple selection scheme and = 1. This convergence time of an
ideal user can be compared on test-cases with known result to the measured
convergence behavior of an interactive ES in order to nd out whether the user
does the selections in an optimal way. If so, it is likely that the implementation of
the algorithm marks the bottleneck of the convergence speed and it is not within
the hands of the user to further improve the convergence speed. Otherwise, if
the convergence speed of the ideal user is indeed much faster than that of the
interactive strategy, providing the user with selection aids or integrating some
smart user monitoring strategy might help to further increase the algorithmic
performance.
However, even for quite simple variant of interactive ES, the computation of
an ideal user behavior might be a challenging task as in every step there are
at least possibilities to choose the best individual, resulting in an exponential
number of at least t possibilities for input streams up to the t-th iteration. In
some cases it might be possible to make the best choice in each iteration by
means of theoretical considerations. It shall also be noted here, that it suces
to nd a computer-based selection strategy that performs much better than the
interactive ES to motivate the potential of further improvements of the user
interaction. Later, we will give an example for such an analysis.
In summary, it seems that only in a few, very simple cases it will be possible to
get meaningful results from a convergence theory of interactive ES and empirical
results will likely play an important part in the dynamic convergence theory of
these algorithms, even if we assume the ideal user.
Another theoretical question that can be easily addressed would be to check
if the strategy can converge globally. This would be the case, if the user can,
in principle obtain any given starting point obtain any other state with a nite
probability and a nite number of inputs, independent of the settings of the
strategy parameters. However, such kind of considerations tell us nothing about
the practically important convergence time, but it can well serve, to discard cer-
tain variants of interactive ES. For constant step sizes the result of probabilistic
convergence in limit for regular target functions and ideal users is simply in-
herited from the theory of the non-interactive ES, provided that the best found
solution is archived in case of a comma strategy. Moreover, the result can be ex-
tended to the self-adaptive case if the step-size is bounded below by a minimal
positive step size [5].
of the interactive ES, the subjective nature of the objective function usually
forbids an arbitrary close approximation of some solution. The reason for this
is that in many cases the user will not be able to measure arbitrarily small
dierences in quality. For example, when comparing two colors, a human will
perceive two colors as equal if their distance is below a just noticeable dierence.
The concept of JNDs is quite frequently discussed in the eld of psycho-physics,
a subbranch of cognitive psychology [2]. It is notable, that the JND depends on
the intensity and complexity of the stimulus presented to the user. Moreover, it
has been found that the lower the dierence between two stimuli and the more
complex the stimuli are, the more time it takes for the user to decide upon the
similarity of two patterns. We will come back to this results, when we discuss
the empirical results of our experiments.
Another dierence between the standard ES and the ES with subjective selec-
tion criterion is that the users attention level will decrease after a while. For a
theory of attention we refer to Anderson [2] and Reason [13]. Hence, the number
of experiments is usually very limited and very fast step-size adaptation mecha-
nisms have to be found, and only a few parameters of the mutation distribution
can be adapted.
Moreover, as discussed above, the amount of interaction should be minimized,
e.g. by choosing a simple selection scheme. This might prevent the use of step-size
adaptation strategies that demand for numerical values of the tness function
value. A performance measure would be based on the number of selections made
by the user, rather than on the number of function evaluations.
Fig. 1. Subjective selection dialogue with user: The upper gures show the initial color
patterns (single color (left) and two-color test case (right)) and the lower gures show
color patterns at a later stage of the evolution. The bigger box on the left hand side
displays the respective target color, and in its center a small box is placed displaying
the selected color. Once the user presses the NEXT bottom a selection gets conrmed
and a new population will be generated and displayed. If the user is nally satised
with the result he/she presses the Done button, in order to stop the process.
in such a way that 50% of the runs were done with the Rechenberg algorithm
and 16.667% of the runs for everyone of the three xed step-size algorithm.
The target color was xed in all the experiments to make them comparable
and was chosen to be [R = 220, G = 140, B = 50] so that every component
of the color was neither close to the variable bounds 0 and 255 nor equal or
close to any other component. Our choice happened to be a brownish version of
orange.
6 Results
First results we collected with our JAVA applet, by the help of many users who
participated in the experiment, are displayed in gures 2 and 3. Next, we will
discuss these results one by one.
Figure 2 shows the average function values for the dierent strategies. Note
that not all runs had the same length. Some people put more eort into nding
a better result than others. This is all part of the eect of using humans instead
of a computer. Conclusions based on this data should take this into account. It
is also notable that users took on average 5 seconds for a selection and hardly
proceeded more than 40 iterations.
The plot in gure 2 shows that self adaptation seems to outperform all the
other algorithms, with human selection as well as with the computer selection.
The xed algorithm with step-size 20 seems to converge a lot faster in the begin-
ning though and stagnate after 10 iterations. This eect is also very plain with
the computer selection. There the xed is quicker until generation 10 and then
self adaptation closes the gap and overtakes. The computer selection was also
run on the three dierent xed step sized, but step size 10 was by far the best
in both the one dimensional and the two dimensional case. We note here, that
it is hard to gure out a-priori what is a good step size for an experiment with
70 80
Self-adaptive Self-adaptive
Fixed Step-Size 10 Fixed Step-Size 10
Fixed Step-Size 20 70 Fixed Step-Size 20
60 Fixed Step-Size 40 Fixed Step-Size 40
Computer SA Computer SA
Computer Fixed 10 60 Computer Fixed 10
50
Average RMS distance
50
40
40
30
30
20
20
10 10
0 0
0 5 10 15 20 25 30 0 10 20 30 40 50
Number of iterations (selections) Number of iterations (selections)
40 70
Single Color Left color
Two Color Right color
35 60
30
50
25
40
20
30
15
20
10
5 10
0 0
0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40
Number of iterations (selections) Number of iterations (selections)
Fig. 3. The left plot shows step-size adaptation in interactive ES for single color and
two-color example. The right plot displays the history of a typical run of the two color
problem. The distance of the two colors to the target color is displayed over the number
of iterations. Note the divergence of the left color after 20 iterations.
subjective evolution, and thus a self-adaptive step size will be always a favorable
solution if no additional problem knowledge is given.
The two dimensional plot is a bit dierent though. Here a xed step size of 20
seems to be the better choice. Self adaptation performs well in the beginning best
here but after 20 iterations the error seems to go up again. This is a unexpected
eect that only seems to occur with the self adaptation in the two dimensional
case if a human selection is used. In the case of the computer selection this eect
is totally absent and self adaptation outperforms all the xed step-sizes.
In an eort to explain this eect gure 3 (right) shows the error of both colors
separately in a typical run of self adaptation using human selection. Note, how
both errors seem to converge nicely up until generation 20. From that point
onward only the right color seems to converge to an optimum whereas the left
color is actually moving away from the optimum. This suggests that this shows
how a human might try to use a certain strategy to optimize two dimensions. In
this case it seems the user tried to optimize the right color rst and worry about
the left color later. Although this seems a feasible strategy, the self adaptive
algorithm has some problems with that. The total error goes up, the step-size is
stagnating and even increasing a bit (as gure 3 (left) shows).
What gure 3 also shows is that the step-sizes are decreasing at the same rate in
the one dimensional problem as they do in the two dimensional problem. It seems
though that in the two dimensional problem the step-size starts of a bit higher.
The fact that the computer selection outperforms the human selection is not
very surprising, as the selection could be viewed as a noisy selection method.
However, it is quite surprising how much this noise inuences the algorithm.
A conclusion from this is that there might be still a great potential for im-
provements of the performance, that might be reached by assisting the user and
diminishing the noise. Moreover, when analyzing the noise, there seems to be
more to it than just adding noise to a tness function. The results suggest hu-
mans use strategies that are based on some outside knowledge or even feelings
that inuences self adaptation in an unfavorable way. Moreover, cognitive re-
540 R. Breukelaar, M. Emmerich, and T. Back
strictions of humans with regard to attention and just noticeable dierences will
have to be taken into account when modeling this noise.
7 Conclusion
This paper contributes to the analysis of Interactive Evolution Strategies (IES)
with subjective selection criteria. The approach has been related to the context
of human algorithm interaction. Dierences between interactive evolutions to
non-interactive ones were pointed out, including a discussion of dierent ways to
analyze these methods. In particular, we compared dierent forms of interaction,
e.g. by means of the decision eort criterion, and suggested concepts to obtain
bounds on the performance of interactive EA. Here we introduced the concept
of an ideal user that can be used to estimate the potential for improvements in
the interactive part of the algorithm.
In the empirical part of the paper we added new results to the experimental
work started by Herdy [9] on a color re-design test case. The experiment was
extended by a two-color redesign example. A JAVA applet implementing the IES
on the color test cases was developed and used in an internet survey to obtain a
signicant number of results from dierent users.
The results clearly indicate the benet of the step-size adaptation. Strategies
that work with step-size adaptation turned out to be more robust and diminish
the risk to choose a completely wrong step size. It is notable within this context,
that the employed 3 point step-size adaptation proved to be benecial for a very
small number of less than forty generations.
By comparing to the results obtained with an ideal users selection scheme,
we could show that the users did hardly select individuals in a way that maxi-
mizes the convergence speed. This adds evidence to the fact that the noisy nature
of the user-based selection is harmful to the algorithms behavior. However, this
result can also be interpreted in a positive way, as it shows that there is still much
room for improvements for the user interaction. For instance decision aids, noise
reduction strategies, or smart user monitoring strategies might help to further
increase the performance of the IES.
We also note, that the kind of selection errors made by the users can hardly
be modeled by standard noise models like constant gaussian distributed osets
to the objective function values. The noise function seems to be time dependent
and dependent on the distance to the target values.
For more complex targets another aspect has to be taken into account when
modeling the user: An insight we got from observations for more complex target
denitions (two color example) was that the user starts to use some strategy, e.g.
to rst optimize the rst and then the second color. Such kind of user behavior
has rarely been addressed in the context of interactive evolutionary algorithms
and deserves further attention.
References
1. P.J. Angeline (1996): Evolving fractal movies, 1st annual conference on generic
programming (Stanford, CA, USA), pp. 503-511
2. Anderson J.R. (2004): Cognitive Psychology and its implications, Worth Publish-
ers, UK
3. Back, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford Univer-
sity Press, New York, 1996.
4. Banzhaf, W.: Interactive evolution. In: Handbook of Evolutionary Computation
(T. Back,, D. Fogel, and Z. Michalewicz, ch C2.10, pp. 1-5, Oxford University
Press, 1997
5. Beyer, H.-G. (2001). The Theory of Evolution Strategies. Springer, Berlin, 2001
6. Dix, A., Finlay, J., Abouwd, G.D., Beale, R. ( 2003). Human Computer Interaction
(3rd ed.). Pearson Education, 2003.
7. B. Filipic, D. Juricic (2003): An interactive genetic algorithm for controller param-
eter optimization, Intl. Conf. on Articial Neural Nets and Genetic Algorithms,
Innsbruck, Austria, pp 458462
8. Hansen, N. and Ostermeier, A. (2001). Completley Derandomized Selfadaptation
in Evolution Strategies, Evolutionary Computation, 9(2):159-195, 2001.
9. M. Herdy (1996): Evolution strategies with subjective selection, PPSN IV, Berlin,
Germany, 1996
10. M. Herdy (1997) Evolutionary optimization based on subjective selection - evolv-
ing blends of coee, In: 5th european congress on Intelligent Techniques and Soft
Computing EUFIT97, pp.640-644
11. Horowitz (1994) Generating rhythms with genetic algorithms, in Int. Computer
Music Conference, (ICMC94) (Aarhus, Denmark), pp. 142 143, 1994
12. I.C. Parmee, C.R. Bonham: Cluster oriented genetic algorithms to support de-
signer/evolutionary computation, Proc. of CEC99, Washington D.C., USA, 546-55
13. Reason J. (1990): Human Error, Cambridge University Press, Cambridge UK, 1990
14. Rechenberg, I. (1994). Evolutionsstrategie 94. Frommann-holzboog, Stuttgart,
1994.
15. Rudolph, G.: On Interactive Evolutionary Algorithms and Stochastic Mealy Au-
tomata. PPSN 1996: 218-226.
16. Schwefel, H.-P. (1995), Evolution and Optimum Seeking, Wiley, NY
17. Takagi H.(2001) Interactive Evolutionary Computation: Fusion of the Capabilities
of EC Optimization and Human Evaluation, Proceedings of the IEEE, vol.89,
no.9, pp.1275-1296, 2001.
An Experimental Comparative Study for
Interactive Evolutionary Computation Problems
1 Introduction
Evolutionary Computation (EC) encompasses computational models which fol-
low a biological evolution metaphor. The success of these techniques is based on
the maintenance of genetic diversity, for which it is necessary to work with large
populations. The population size that guarantees an optimal solution in a short
time has been a topic of intense research [2], [3]. Large populations generally con-
verge to better solutions, but they require more computational cost and memory
requirements. Goldberg et al. [4] developed the rst population-sizing equation
based on the variance of tness. They further enhanced the equation which
allows accurate statistical decision making among competing building blocks
(BBs) [2]. Extending the decision model presented in [2], Harik et al. [3] tried to
determine an adequate population size which guarantees a solution with the de-
sired quality. To show the real importance of the population size in Evolutionary
Algorithms (EAs) He and Yao [5] showed that the introduction of a non ran-
dom population decreases convergence time. However, it is not always possible to
deal with such large populations, for example, when the adequacy values must be
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 542553, 2006.
c Springer-Verlag Berlin Heidelberg 2006
An Experimental Comparative Study for IEC Problems 543
2 Description of Algorithms
IEC algorithms are dicult to test because they require the evaluation of the user
in their experiments. In addition, if the designer wants to compare the new algo-
rithms with existing ones, even more experimentation with humans is required.
The proposed algorithm (Chromosome Appearance Probability Matrix, CAPM)
is compared in every domain with the classical Genetic Algorithm (Simple Ge-
netic Algorithm, SGA) with the Fitness Predictive Genetic Algorithm (FPGA)
and a probabilistic algorithm called Population Based Incremental Learning
(PBIL). The SGA is always a good point of reference for the comparison and
the FPGA is one of the latest proposals for IEC problems [15]. The probabilistic
approach gives another interesting point of view to compare with, because it is
the starting point for the Estimation of Distribution Algorithms (EDA), [16] and
[17]. Besides, the proposed method was partially inspired by the EDAs.
The representation chosen for all the chromosomes is the same in all the
contrasted algorithms. Each individual is made up of various chromosomes which
are in turn, made up of a vector of integers. In the following section the selected
algorithms, except the SGA, will be briey explained.
Step 4 - Select two parents: as with the SGA, the selection operator im-
plemented for the experimentation is based on the selection of the two best
individuals according to their adaptation or tness value. However, the selection
can only be done with the best N individuals of the population. This previously
mentioned limitation is typical of FPGAs and IEC techniques in which the user
evaluates. Therefore, the selection forces the evaluation to a subset of the pop-
ulation N. During each iteration, the selections made by the user are stored in
the ps1 and ps2 variables with the intention of doing other calculations which
will obtain the predictive tness of the following iterations.
Step 5 - Cross pairs with parents: this process is exactly the same as the
SGA. Once the best two individuals of tness value are selected the cross oper-
ator is applied in order to obtain the new generation of individuals. The propa-
gation strategy is elitist and, as a result, the selected parents go directly to the
following generation. The remainder are generated from the equally probable
crossing of the parents.
Step 6 - Mutate the obtained descendants: as with the SGA, this process is
responsible for mutating, with a certain probability, (Pmutation ) several genes of
the generated individuals. The aim is to guarantee the appearance of new char-
acteristics in the following populations and to maintain enough genetic diversity.
Step 7 - Repeat until the nal condition: like the SGA, the stop condition
of the algorithm is imposed by a maximum limit of iterations.
Step 3 - Update the probability matrix: this step uses the following up-
dating rule:
Step 5 - Repeat until the nal condition: like the SGA and the FPGA the
stop criterion is imposed by a maximum limit of iterations which depend on the
problem to solve.
The steps of the proposed algorithm are explained in detail in [28] and [1],
however this method introduces the following new features which dier from a
canonical genetic algorithm.
Probability matrix for guiding mutation: when the user selects an ele-
ment of the population, his or her selection is usually based on the collective
combination of features in each element of an individual. For example, if the
user is searching for tables he will appreciate the combination of several char-
acteristics such as the color of legs of the table, the number, the shape of the
surface, etc.. Therefore the information about the whole chromosome should be
kept. To do this, a multidimensional array, with the same number of dimensions
as genes, has been included. The bounds of the dimensions are determined by
the number of alleles of the dierent genes.
The probability array M is initialized by M (gene1 , gene2 , gene3 , genem ) =
1/T where m is the number of genes, and genei could have values in [allelei1 ,
allelei2 . . . alleleini ], and ni the number of alleles of gene i. The total possible
combinations of chromosomes m T is calculated by multiplying the maximum
sizes of each gene (T = i=1 ni ).
This array shows the probability of being chosen that each possible combina-
tion of alleles have. Each iteration implies a selection of one or two individuals,
and its chromosomes represent a position in the above array. After the selection,
the corresponding position in the array is updated by a factor of with the
increment factor of the update rule, M . This M is calculated by the following
equation:
M = [Mgene1s ,,genens (1.0 + ))] Mgene1s ,,genens (3)
The example in gure 1 shows how the update rule works for 1 chromosome
with 2 dierent genes, gen 1 with 4 alleles {pos1,..,pos4}, and gen 2 with 10,
{0..9}. It can be clearly seen how the probability matrix M is updated with
= 0.005 and how it aects to the rest cells.
The update operations take care that the sum of all the elements of the array
will be 1. This array is very useful to keep information about the selection fre-
quency of a determined chromosome, and therefore, to help the mutation process
to evolve towards the preferences of the user.
This approach has the ability to broadcast the preferences of the user towards
all the chromosomes of the individuals.
Replacement of all the population but parents with the new individ-
uals: the proposed strategy is elitist, however, as the user is not interested in
evaluating the same individuals between iterations, the algorithm mutates the
parents for the next generation, making them slightly dierent.
3 Experimental Tests
The problem arises from an idea proposed in [25] and it is explained more in
detail in [1]. The challenge of the trademark nder is to help the user in the task
of nding a specic logo which is new, dierent, or eye-catching, for a product,
or a company. For this purpose, like in brainstorming process, the system oers
dierent types of words, applying dierent colors, backgrounds, and styles. This
An Experimental Comparative Study for IEC Problems 549
is only a rst version for making the experiments, but in future developments
gures and letter positions should change.
The algorithm starts randomly with six words (population), each one made
of ve letters, with random colors and styles. The user must select in each iter-
ation the best two alternatives for him. Thus, each word is an individual of the
population, and each letter is a chromosome with four genes representing color,
font type, size and background, respectively. The search space is 3, 2 1021 .
oped the nal condition is reached when the tness of one element is 100 points
(optimal solution), or when the iterations surpass 50. A valid measure is consider
when the tness is greater than 90.
At this point, after making 10.000 experiments per algorithm, running them
with dierent parameters, see table 1, and both types of automatic evaluation
function, the following results were achieved (best parameters found for each
algorithm):
For the evaluation function based on the ranking the preferences only the
CAPM method reaches the optimal solution in all experiments done (iteration
32 in average), and neither of the others even reached a valid solution.
When comparing the results, there is a signicant dierence between the
algorithms: CAPM reaches much faster solutions in terms of iterations and better
quality solutions. As can been seen clearly in tables 1, 2 and also in gure 2.
Besides, the PBIL shows clearly a low performance when working with mi-
cropopulations, due to the genetic diversity during the rst 50 iterations is too
low. Experiments with mutation likelihood of 45% have been run but this high
mutation rate does not help the algorithm and makes the search completely
random.
The FPGA has a good performance, but does not always beat the SGA results.
The main reason is that the Euclidean distance equation used for the tness
prediction is not useful when predicting ranking evaluations or uctuating rule
sets. At this point future developments to try FPGA with prediction functions
based on Neural Networks or Support Vector Machines are suggested.
Looking at the results for the uctuating decisions evaluation model, the
user must take at least 35 iterations in average to obtain a valid solution, and
never reaches an optimal solution. However, the results are no so bad taking
into account that the evaluation is using a non-linear function which changes of
evaluation rules randomly (with 30% of probabilities of changing each iteration).
It applies opposite evaluation preferences for each set of rules, and nally nds
valid solutions after 35 iterations in average. The evolve process is very complex
and is unpredictable what is really going to happen with the user. It is not usual
to see the user changing completely of preferences each iteration (with 30% of
probabilities), but it was decided to experiment with the worst case. In fact, in
this evaluation mode all the algorithms tested except CAPM are unable to nd
even a valid solution (tness 90).
4 Conclusion
After the study and development of dierent algorithms and their formal com-
parative test for solving an specic problem of IEC, the following conclusions
have been drawn:
2. The analysis of the results of the SGA, shows that, although it is considered
sucient so far, it is not good enough for IEC, because it never reaches
a valid solution in the experiments made. Besides, the proposed method
has been compared successfully with an algorithm specically developed for
solving IEC problems (FPGA) and with a probabilistic algorithm (PBIL).
3. The proposed method has the capability of learning by the study of the
selections made by the user. This alternative improves the results given by
the SGA, FPGA and PBIL in terms of the quality of the solutions and the
number of iterations in nding valid/optimal solutions.
Finally, as the micropopulations aects negatively to all algorithms tested,
becoming the main reason to decrease their performance, to avoid this prob-
lem a proposal is made to increase the selection pressure with the probability
matrix. However, other proposals based on predictive tness must be studied
too.
Acknowledgment. This article has been nanced by the Spanish founded re-
search MCyT project OPLINK, Ref: TIN2006-08818-C04-02.
References
1. Saez Y., Isasi P., Segovia J., J.C. Hernandez Reference chromosome to overcome
user fatigue in IEC, New Generation Computing, vol. 23, number 2, Ohmsha -
Springer, pp. 129142 (2005).
2. Goldberg D. E., Deb K., and Clark J. H., Genetic algorithms, noise, and the
sizing of populations, Complex Syst., vol. 6, no. 4, pp. 333-362. (1992).
3. Harik G., Cantu-Paz E., Goldberg D. E., and Miller B. L., The Gamblers ruin
problem, genetic algorithms, and the sizing of populations, Transactions on Evo-
lutionary Computation, vol. 7, pp. 231253, (1999).
4. Goldberg D.E., and Rundnick M. Genetic algorithms and the variance of tness,
Complex Systems, vol. 5, no. 3, pp. 265278, (1991).
5. J. He and X. Yao, From an individual to a population: An analysis of the rst
hitting time of population-based evolutionary algorithms, IEEE Transactions on
Evolutionary Computation, vol. 6, pp. 495511, Oct. (2002).
6. Sims K., Articial Evolution for Computer Graphics, Comp. Graphics,Vol. 25,
4, pp. 319328. (1991).
7. Moore, J.H. GAMusic: Genetic algorithm to evolve musical melodies.
Windows 3.1 Software available in: http://www.cs.cmu.edu/afs/cs/project/ai-
repository/ai/areas/genetic/ga/systems/gamusic/0.html. (1994).
8. Graf J., Banzhaf W. Interactive Evolutionary Algorithms in Design. Procs of
Articial Neural Nets and Genetic Algorithms, Ales, France, pp. 227230, (1995).
9. F.J. Vico, F.J. Veredas, J.M. Bravo, J. Almaraz Automatic design sinthesis with
articial intelligence techniques. Articial Intelligence in Engineering 13, pp.
251256, (1999).
10. Santos A., Dorado J., Romero J., Arcay B., Rodrguez J. Artistic Evolutionary
Computer Systems, Proc. of the GECCO Workshop, Las Vegas. (2000).
11. Unemi T. SBART 2.4: an IEC Tool for Creating 2D images, movies and collage,
Proc. of the Genetic and Evolutionary Computation Conference Program, Las
Vegas. (2000).
An Experimental Comparative Study for IEC Problems 553
Abstract. In this paper, our model supplies designing environment that used the
component network to identify the high score components and weak compo-
nents which decrease the number of components to build a meaningful and eas-
ily analysis simple graph. Secondary analysis is the bipartite network as the
method for formatting the structure or the structure knowledge. In this step the
different clusters components could link each other, but the linkage could not
connect the components on same cluster. Furthermore, some weak ties compo-
nents or weak links are emerged by Bipartite Graph based Interactive Genetic
Algorithm (BiGIGA) to assemble the creative products for customers. Finally,
we investigated two significantly different cases. Case one, the customer did not
change his preference, and the Wilcoxon test was used to evaluate the differ-
ence between IGA and BiGIGA. The results indicated that our model could cor-
rectly and directly capture the customer wanted. Case two, after the Wilcoxon
test, it evidenced the lateral transmitting using triad closure extent the concep-
tual network, which could increase the weight of weak relation and retrieved a
good product for the customer. The lateral transmitting did not present its con-
vergent power on evolutionary design, but the lateral transmitting has illustrated
that it could quickly discover the customers favorite value and recombined the
creative product.
1 Introduction
Kotler and Trias De Bes (2003) at the Lateral Marketing said, The creativity which
in the customers designing process is a kind of lateral transmitting, and only the
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 554 564, 2006.
Springer-Verlag Berlin Heidelberg 2006
Creating Chance by New Interactive Evolutionary Computation: BiGIGA 555
customer really knows what he wants. The evolutionary computation must have a
subjective (fitness function) for evaluation; thus, it does not suit personal design. The
first application of IEC was that it had helped the witness to recognize the criminal
face (Caldwell and Johnston, 1991), and it broken the limitation of EC (evolutionary
computation) which needed the fitness function to implement the evolutionary
computation. Therefore, the interactive genetic algorithm (IGA) (Nishino, Takagi,
Cho and Utsumiya, 2001, Ohsaki and Ingu, 1998) has become a useful method. In the
EC, the better chromosome has best chance to propagate the genes to the offspring; in
contrast with the IEC, the interactive process not only supplies the stimulating
information for a designer to drive him discovering what he wants, but also supplies
the choosing and recombining power for a designer to make a creative product.
Consequently the creative designing problem has become how to help the designer
emerging a new creativity or recombining the creative concept into product. In
addition, the fatigue issue is the problem of IEC (Nishino, Takagi, Cho and Utsumiya,
2001; Ohsaki and Ingu, 1998; Takagi, 2001), for example, the chromosome contains i
genes and each gene has j levels, then there will be j ^ i assembling patterns and its
presenting population will be about 6-12. This means that the probability for a
designer to find the best pattern is population / (j*i). However, the interactive process
has become a heavy load for a designer to find a good solution in the limited timeline.
In this paper, we introduce creating chance system which using the text mining
technology to discover the weak ties on bipartite network (components network and
product linking network) to reduce the customers fatigue problem.
2 Related Works
One of reducing fatigue problem is the fitness function predicting method. Lee and
Chao (1999) used the sparse fitness evaluation which included the clustering technol-
ogy and fitness allocation method to reduce customer burden. But, in the experiment
they used DeJongs five functions as the fitness function; they did not really include
the human being into genetic algorithm. Nonaka and Takeuchi (1995) proposed the
knowledge growing theory. From the theory we know that on the designing process
the customer want change component within chromosome which what he wanted.
However, depending on the fitness could not resolve the designing problem. Haya-
shida and Takagi (2000) proposed the visualized IEC that combined the different
capabilities of EC and human beings searching for a global optimum. However, the
EC searched in n-D space and the human being globally searched on 2-D map. This
means that using SOM distorted n-D to 2-D; then, human being according to the 2-D
information decided his favorite position on the map. The EC relied on new fitness
function to search the favorite product. In order to assemble the creative product we
must understand the whole subjective space (2-D) and n-D context space. However, it
seems not easy for a customer to use it. From these papers we know how to recognize
the customers mind is a very important issue, because it not only understands the
fitness function but also needs to know what the favorite cluster of component is
556 C.-F. Hong et al.
customer wants. From the past experiment data we discover that the components
association rules can help us understand the customers favorite cluster or component
network. In the next section we want to discuss how to understand what is customer
wants and how to create a chance from history datas structural information for him.
Ohsawa and McBurney (2003) analyzed the contexts to define the high frequency
term and mimed the relationship between the terms and then to build the associating
graph. Finally, according to the important terms and associational terms they
discovered the important weak key. Thus, these terms were as the chance. Watts
(2003) proposed bipartite network which was built by multi-identity networks, their
relationship were very complex. For example, Watts described the relationship
between the actors and movies as shown in Fig.1. After collecting the data, according
to the data, Watts built the component affiliation network (as the terms and
connection by Ohsawa and McBurney defined) and let assembling like the cluster
group network. The network could emerge new chance or creative chance by change
the original structure of cluster group network. It means that bipartite network could
depend on the overlap cluster network to expand the component network. It means
had more flexibility to expand the network for creating the chance too.
The previous discussion indicates that a customer had his favorite components
cluster; hence the system collected the interactive data, and then passed through the
associative analyzing we can sketch out the favorite component cluster and associat-
ing pattern. After the analysis the triad closure method is used to expand the linking
structure and assemble the new product. In this step, we expect that the component
network is as the affiliation network; therefore, the product is assembled by some
components which structure is as the bipartite network. Finally, according to the
component affiliation network and bipartite network build the overlap network. For
this reason, we have two kind triad closure processes at least: one is expanding on
component network and the other is expanding on overlap network.
From the previous discussion, we think that these methods depended on the net-
works structural information to discover the chance. However, we use the graph theory
to recognize the interactive data in designing process, and analyze the structural infor-
mation from the characteristic network, which may result in discovering the creative
chance.
Our system as shown in Fig.2, the agent collected the interactive data and then to
analyze what components and links can be accepted by customer; then according to
the accepted data to discover the objective network and to construct customers value
network. Therefore, the strong ties components is used to building the component
network and the strong ties product is used to building the bipartite network. Finally,
the triad closure method is used to extend the component network and overlap net-
work for discovering weak ties components (assemble the creative product).
Weak Ties Mining Architecture
As shown in Fig.3 the system generated six products requested the customer to evalu-
ate what product is wanted. After evaluation the system would collect interactive data
as presented on table2.
We defined the columns and rows on the table2 as the sentence, the high score com-
ponent that was higher than the threshold value would be selected, relatively, the low
score component that was lower than the threshold value would be removed. The high
score component was defined by Eq.1
558 C.-F. Hong et al.
Cc was the score given by the customer for c component, Tc was the appearing time of
c component, and p was the population size.
Here, according to the set of high score key a (strong ties) to execute the crossover
process as shown in Fig.4.
Fig. 4 a. The set of high score key and change each other, b. Mapping to the product
According to the interactive data, we can know the relationship between components
and the important key key (a) (by Eq.2). Here we added an environmental constraint:
only to count that in the same clusters link. It using Eq.1 and Eq.2 to sketch out the
KeyGraph (the strong component not only is the important keys, but also is the high
score key) (as shown in Fig.5). It uses Eq.2, the components which are the weak com-
ponents (not that high score components, but that the important components). The
solid line represents the strong link (same cluster), and the dotted lines are the weak
Creating Chance by New Interactive Evolutionary Computation: BiGIGA 559
link (different cluster) (as defined by Eq.3). These weak ties components and weak
ties links components are as the bridge (chance) which can guide the old network to
the new creative network.
(
Assoc(a i , a j ) = min a i s , a j )
key(a ) = 1 1
base(a, g )
(g )
g G neighbors
s
if a g , g a s = g as if a g , g a s = g s
The s is the sentence, the g is the cluster, D is the sentence, also is the article, the G is
the all clusters.
After the previous procedures, using triad closure method to extend the component
network (from dotted to solid), and the operation is shown in Fig.6.
After analyzing the important and weak components, in this section we want to un-
derstand how to assemble each other.
Assoc(ai , a j ) = min ai ( pis
, aj
pjs
) for pi pj
n (3)
key(a ) = Assoc(ai , a j ) for i j
j =1
The a is the set of components, the pi and the pj are the part i and part j, and n is the
number of selected components. It depended on the distance (links weight) to sepa-
rate it to the strong link or weak link. Of course, the link crossover was that we want
changed to assemble the potential products (as shown in Fig.7).
560 C.-F. Hong et al.
key headset
faceplate
Fig. 7 a. Triad closure method to extend the product network, b. Mapping to the product
The a is the weak ties components and weak ties links components, the b is all the
high score component set and b is the partial of b.
Crossover = at + b't (4)
However, the partial of b is replaced by a to make a creative product a+b, that is as
the chance creating operation.
In this operation, it includes the strong ties crossover that strong ties components or
strong links replaced each other and the weak ties crossover that weak ties compo-
nents replace the strong ties components or the weak links replaced the strong links.
These mechanisms are not like the IGA (depends on the strong ties crossover) have a
little chance for mutation (the strong components replaced by weak ties components)
and less triad closure expanding method creates the chance.
3.5 Mutation
In VFT process we could easily get the customers fundamental objectives, meaning-
ful and context space. Then, according to different requirements we design the value
network and lock some favorite components and links structure what customer
wants. Beside this, the mutation is very important process that it could generate the
diversity of components or links in the evolutionary process. It will be a chance for
searching his favorite component network and value network (as shown in Fig. 8).
4 Experimental Design
The purpose of this experiment was that recombined the weak ties and bipartite net-
work with IEC to guide the customer design his favorite product. First, we had sur-
veyed the cellular-phones market and implemented three brainstorm meetings to
design the customers value network and the context of the cellular-phone. The cellu-
lar-phone was assembled by five parts as the faceplate, handset, screen, function-key
and number-key. In our experiment, the faceplate had four types and each type had
sixteen levels, the handset had four levels, the screen had four levels, the function-key
had four levels and the number-key had four levels. The design of parameters of IGA
and BiGIGA were shown on Table 3.
IGA BiGIGA
coding binary binary
selection method elitism elitism
population size 6 6
crossover method one-point Lateral transmitting (near)
crossover rate 0.8
mutation method one-point lateral transmitting (far)
mutation rate 0.01
Twenty-two samples were drawn from the professors and colleges of university in
north Taiwan. After the experiment, we had celebrated some tests to evidence the
BiGIGA had the creative ability better than the IGA.
5 Case Study
In order to compared the variation of prefer components between the IGA and the
BiGIGA, we defined the Eq.5 for calculating the entropy, to investigate the variation
of prefer components.
appeared _ component
entropy = (5)
total _ component
In the case 1, the analysis result is shown in Fig. 10. At g=1 all products were gener-
ated by random. At g=3 for IGA, the strong ties components dominated the evolution-
ary and as a result of the customer can quickly found his prefer product, but for
BiGIGA its natural mechanism must mix the strong ties and weak ties on searching
processes to help customer design, therefore, after the Wilcoxon matched-pairs
signed-ranks test (the IGA was 0.05 and the BiGIGA was 0.10), the result indicated
that if the customer really knew what he wants BiGIGA model would work as same
as the IGA or little worse than IGA.
At g=1, BiGIGA lacked enough data to build the product network. At g=2, BiGIGA
found some good products and obviously built the product network. After g=2, the
BiGIGA brought the lateral transmission and influenced the customers value struc-
ture. The valid data quickly decreased it brought the broken product network; the
graph was as the Small World on critical state. The weak ties components as the
short-cut on Small World, at g=4, the high density product network was built. After
the Wilcoxon matched-pairs signed-ranks test (IGA is 0.0017, BiGIGA is 0.09),
which means that the BiGIGAs components variation was higher than the IGA.
From Fig.11 BiGIGA, we clearly observed that the customer was influenced by weak
components and shifted his favorite to other cluster for creating innovation. And then,
the new components cluster would supply for customer to continue his design,
enlarging the entropy. When the valid products were enough to build the product
network, the strong link connected with strong components for assembling the strong
products. The IGA depend on the crossover and mutation to assemble the favorite
product (time consuming as a result of lack chance to change), relatively, the BiGIGA
has three kinds weak ties mechanism that owns big chance to change. It evidenced our
bipartite graph could response the lateral transmission and according to the short-cut
quickly discovered the favorite product.
Creating Chance by New Interactive Evolutionary Computation: BiGIGA 563
6 Conclusions
In this study, we developed a creative designing system and it supplied three kinds of
weak ties to extend the complexity of network for emerging the chance and helped the
customer to design his favorite product. In addition, the BiGIGA is not as same as the
IGA, which not only depends on the nature selection (strong ties), but also depends on
the weak ties that can keep the diverse evolution. Such the mechanism may signifi-
cantly decrease the customers fatigue problem. The experimental results also indicate
that the BiGIGA not only to calculate the components weight, but also relied on the
bipartite network to expand the weak ties for increasing the diversity. Such the
mechanism can quickly expand the useful network and discover the favorite product
for the customer (average times: IGA is 10, BiGIGA is 6.31). Beside this we investi-
gated the graph, when the graph became more complex and the diameter of cluster
quickly decreased and become a Small World. In the Small World environment the
customer could easily design diverse products and discover the product that he
wanted. This is a very interesting weak ties phenomenon.
References
Caldwell, C., and Johnston, V. S., (1991), Tracking a criminal suspect through face-space with
a genetic algorithm, Proceedings of the fourth international conference on genetic algorithms,
San Francisco, CA: Morgan Kauffman Publishers Inc, pp. 416-421.
Hayashida, N., and Takagi, H., (2000), Visualized IEC: Interactive Evolutionary Computation
with Multidimensional Data Visualization, IEEE International Conference on Industrial
Electronics, Control and Instrumentation (IECON2000), Nagoya, Japan, pp.2738-2743.
564 C.-F. Hong et al.
Keeney, R. L., (1992), Value focused thinking: A path to creative decision making, Cam-
bridge, MA: Harvard University Press.
Kotler, P., and Trias De Bes, F., (2003), Lateral marketing: New techniques for finding break-
through ideas, John Wiley & Sons Inc.
Lee, J. Y., and Cho, S. B., (1999), Sparse fitness evaluation for reducing user burden in inter-
active genetic algorithm, IEEE International Fuzzy Systems Conference Proceedings, pp.
998-1003.
Nishino, H., Takagi, H., Cho, S., and Utsumiya, K., (2001), A 3d modeling system for creative
design, The 15th international conference on information networking, Beppu, Japan, IEEE
Press, pp. 479-486.
Nonaka, I. and Takeuchi H., (1995), The knowledge-Creating Company. Oxford Press.
Ohsaki, M., Takagi, H., and Ingu, T., (1998), Methods to reduce the human burden of interac-
tive evolutionary computation, Asia fuzzy system symposium, Masan, Korea, IEEE Press,
pp. 495-500.
Ohsawa, Y., and McBurney, P., (2003), Chance Discovery, New York: Springer Verlag.
Takagi, H., (2001), Interactive evolutionary computation: fusion of the capabilities of ec opti-
mization and human evaluation, Proceeding of the IEEE, IEEE Press, pp. 1275-1296.
Watts, D. J., (2003), Six Degrees: The Science of a Connected Age, Norton, New York.
Interactive Evolutionary Computation
Framework and the On-Chance Operator
for Product Design
1 Introduction
Product design is one of the most important tasks of product development pro-
cess [1]. Traditionally, in product design phase, a product is represented by a
vector of attributes referring to customer needs and product specications (for
example, appearance, price, functionality and so forth). Moreover, each attribute
includes several dierent alternatives named as attribute levels. Given a vector
of attributes, conjoint analysis [2][3], which is an interactive and structural tech-
nique, is the most common used approach to determine the optimal product.
However, as proven in [4], product design is a combinatorial optimization prob-
lem and hence NP-hard. Once the number of attributes or levels is very large, the
interactive process of conjoint analysis becomes infeasible because of evaluation
fatigue. Various approaches have therefore been proposed for solving product
design problems with large amounts of attributes or levels, including adaptive
conjoint analysis [5] and polyhedral adaptive conjoint [6]. In this paper, we pro-
pose an interaction evolutionary computation (IEC) framework to address the
product design problem since IEC is a powerful tool for identifying the user
preference [7].
Although being capable of identifying the user preference, IEC suers from
evaluation/ human fatigue as well. Therefore, the canonical IEC is intuitively
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 565574, 2006.
c Springer-Verlag Berlin Heidelberg 2006
566 L.-h. Wang, M.-y. Sung, and C.-f. Hong
insucient for addressing the product design problem. In other words, the IEC
approach for product design will work out only if the human fatigue problem of
IEC is appropriately addressed. A prediction module, which learns user evalua-
tion patterns for predicting tness values of succeeding individuals, is thus incor-
porated into our IEC framework for product design. Moreover, due to the eval-
uation criterion for product design has an additive functional structure, which
is actually a utility function found in utility theory [8] , the learning results can
be further used to design genetic operators for accelerating the searching pro-
cess. As a result, we propose an IEC with prediction module framework and a
specic genetic operator named as on-chance operator, which is dierent from
the traditional by-chance genetic operators, for product design in this paper.
The rest of the paper is organized as follows. Section 2 begins with giving
the formal denition of the product design problem. IEC framework for product
design, which incorporates with a prediction module, is then presented. The idea
of on-chance operator is also introduced in this section. Section 3 illustrates our
experimental design rst. Actually, we implemented an IEC without prediction
for comparison. Then experimental results are shown in this section and followed
by a discussion on results. The nal section presents the conclusion and outlines
our future work.
As mentioned earlier, the conjoint analysis model is the most common approach
to address the product design problem. The problem will be formulated as a com-
binatorial optimization problem under conjoint analysis model. In other words,
a product is considered to be a vector of k relevant attributes. And further,
each attribute ai (i = 1, 2, , k) has si dierent attribute levels, l1i , , lsi i , for
instance. The product design problem therefore becomes a problem of searching
the optimal combination of attribute levels to satisfy the user preference. As-
sume P is the set of all possible product proles (a product prole is an arbitrary
combination of attribute levels):
Easier implementation is the main reason for choosing GA. Fig. 1 illustrates the
system architecture of our IEC framework for product design.
Two primary modules, including an IEC module and a prediction module,
are sketched in Fig. 1. The IEC module is responsible for interacting with the
user generation by generation and eventually evolving satiscing solutions for
the user. However, evaluation data will be tted into the prediction module for
learning user evaluation patterns. More importantly, an on-chance operator is
dened according to the feedback from the prediction module. The on-chance
operator is a deterministic operator, which replaces bad genes, or bad attribute
levels equivalently, with better ones. The osprings generated by the on-chance
operator will be combined with other osprings produced by canonical genetic
operators to form the next generation. The details concerning the on-chance
operator will be explained in the Section 2.3.
The prediction module is actually a GA predictor. The evaluation data col-
lected by the IEC module will serve as the training data for the GA predictor.
The chromosome of the GA predictor is an array of the prediction values of all
attribute levels. For example, if a product consists of k relevant attributes and
each attribute ai has si attribute levels, then the chromosome will be an array
shown in Fig. 2.
In Fig. 2, all k attributes from a1 to ak are included in this array. Moreover,
each element of the array represents the prediction utility value of the corre-
sponding attribute level. As such, the root mean square (RMS) error function is
an intuitive way to compute the tness value of specic chromosome. The tness
function of the GA predictor is thus dened as follows:
n
i=1 (ui ui )
2
F (c) = (4)
n
where c is a chromosome, n is the number of training data, ui is the utility value
of ith training example, which is actually the score given by user and ui is the
utility value of ith training example predicting by the chromosome c.
The GA predictor will minimize the RMS error as much as possible and
generate a utility prediction function for feeding back to IEC module. As a
result, this function can be utilized to evaluate the succeeding individuals as
well as to design the on-chance operator.
As a result, both the on-chance operator and the canonical genetic operators,
which are typically by chance, are used to evolve the population of IEC.
570 L.-h. Wang, M.-y. Sung, and C.-f. Hong
Two systems, including a canonical IEC (cIEC) and the IEC framework (IECoc ,
oc stands for on-chance) presented in Section 2.2, have been implemented for
comparison. The main issues are the eciency and the quality of solution. More-
over, we have tried various combinations of the on-chance operator and canonical
genetic operators to investigate how the on-chance operator impacts on the evo-
lution of IEC.
A cellular phone design application is selected for investigation. More pre-
cisely, a cellular phone consists of ve dierent attributes in this research. The
attributes includes appearance, display, numerical keypad, receiver and control
button. The appearance attribute has 19 dierent attribute levels and the others
have 4 attribute levels respectively. Fig. 3 illustrates all these 35 attribute levels.
Some important attributes of a real world product design application (such
as price, for instance) are missing in the specication listed above. However, the
combinatorial nature is preserved in this test application.
The population size of both cIEC and IECoc were 9. Two-way deterministic
tournament selection was used for both IEC systems. cIEC adopted single point
crossover and mutation with probabilities 80% and 0.6% respectively. Ten-point
scale was used for users to evaluate each candidate cellular phone. If the user
gave any one cellular phone 10 points, cIEC or IECoc would terminate. Then
the cellular phone getting 10 points would be the nal solution. Fig. 4 shows the
interface of the IECoc .
A series of Monte Carlo simulations had been running to determine the pa-
rameter settings of the GA predictor mentioned earlier. The parameter settings
are listed in Table 1.
Eventually, the laboratory experiment was used to evaluate the performance.
Dozens of subjects were invited to manipulate IEC systems to identify their most
Fig. 3. The cellular phone design application. Five attributes, including appearance,
display, numerical keypad, receiver and control button are used in this application. The
appearance has 19 attribute levels and the others have 4 attribute levels respectively.
IEC Framework and the On-Chance Operator for Product Design 571
population size = 20
crossover rate = 0.95
mutation rate = 0.14
number of generations = 500 or 80% of individuals dominated by elitist
favorite cell phones. The subjects were randomly divided into several groups and
were requested to operate one of the systems.
The number of generation and the execution time. The average numbers
of generation needed by ve dierent IEC systems are listed in Table 2. However,
because each system was only manipulated by 5 subjects, Mann-Whitney test,
which is a non-parametric test procedure was used to investigate the signicance
of experimental results. As a result, the data listed in Table 3 are yielded by the
test procedure.
In Table 3, the average numbers of generation of IECoc1 and IECoc4 are
both signicant fewer than cIEC. The other two versions of IECoc need less
572 L.-h. Wang, M.-y. Sung, and C.-f. Hong
Table 4. Mann-Whitney test results of satisfaction investigation for cIEC and various
IECoc
generations than cIEC too, though the results are not signicant. Therefore,
IECoc is no doubt faster than cIEC for the most cases. That implies the on-
chance operator speeds up the convergence of IEC.
complicated product whose attributes or attribute levels are plenty. We can just
estimate the number of proles needed to evaluate for our cellular phone design
application from the literature. As a result, no more than 80 product proles
will be needed for our design application, since a minimal design for a problem
with 5 attributes and their attribute levels are 20, 8, 8, 8 and 8, respectively,
whose problem size is larger than ours, actually need only 80 proles [9]. In other
words, if the population size of IEC is 9, then 9 generations are the upper bound
for nding nal solutions. Fortunately, IECoc almost needs 6 to 8 generations
to nd out solutions according to the experimental results listed in Table 2.
However, new algorithms for product design presented recently, especially the
fast polyhedra conjoint, are more eective than the orthogonal array technique.
For instance, the fast polyhedra conjoint [6], which is a branch-and-bound tech-
nique, can solve our cellular phone design application only by evaluating 35
product proles (35 is the number of total attribute levels), which is equivalent
to 4 generations.
Nevertheless, the IEC approach for product design still has some advantages
over algorithms using conjoint model. The learning capability is one of the ad-
vantages. Conjoint model assumes the attributes are mutually independent. If
attributes are not mutual independent, that is the problem is a non-linear prob-
lem, then the proles needed to evaluate will increase tremendously. However,
learning methods such as neural networks and genetic programming can handle
the non-linearity easily.
References
1. Krishnan, V., Ulrich, K.T.: Product development decisions: A review of the liter-
ature. Manage. Sci. 47 (2001) 121
2. Shocker, A.D., Srinivasan, V.: Multiattribute approaches for product concept eval-
uation and generation: A critical review. Journal of Marketing Research 16 (1979)
158180
574 L.-h. Wang, M.-y. Sung, and C.-f. Hong
1 2
Fang-Cheng Hsu and Ming-Hsiang Hung
1 Department of Information Management, Aletheia University
http://fanechi.myweb.hinet.net/
fanechi@email.au.edu.tw
2
Graduate School of Management Science, Aletheia University
s1788124@email.au.edu.tw
1 Introduction
Fulfilling customers true needs is essential in marketing. In the current manufac-
turers-stores-buyers commerce framework, buyers must go to stores or shopping
malls to choose the products that satisfy their wants. Customers usually spend
significant amounts of time looking for their favorite products in the stores, but the
items they actually get do not always meet their needs.
In traditional manufacturer-centered designs, market researchers are sent to
discover customers needs. Manufacturers design new products based on researchers
reports. No matter how many unmet needs are reported, the manufacturers decide
which ideas to develop and assign them to project development teams. Transferring
the customers needs and wants this way might result in biased information and cause
the product designers to create unneeded products. Furthermore, this can result in
producing low-demand products.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 575 585, 2006.
Springer-Verlag Berlin Heidelberg 2006
576 F.-C. Hsu and M.-H. Hung
2 Background
facilitating toolkits in order for customers to interact with the system. One practicable
user-centered design system refers to customer innovation [2], [3] and is well
known by industries. The conjoint analysis-based approaches [4], [5], [6] are also
available for implementing user-centered design systems.
In most cases, the customers designs are handled as a sub-process of product
development. The product profiles designed by the customers are sent back to the manu-
facturers for analysis in order to develop new products that satisfy both the customers
wants and the manufacturers profit requirements. In other cases, the manufacturers
embed cost information in toolkits and use the cost information as constraints when
customers use the toolkits to design their products. This customers interactively design
with manufacturers strategy is applicable to B2C and B2B e-commerce, but might not
be able to survive personalization requirements and a quickly changing environment.
A better method of meeting customers needs is allowing them to design product
profiles completely by themselves and making the products accordingly. A new C2C
e-commerce framework can be created if C2C market providers could equip C2C
buyers with a set of design toolkits that allow them to transfer their wants into product
profiles while allowing numerous individual sellers on the web to negotiate with the
buyers for an acceptable price.
Genetic algorithms (GA) need a fitness measurement for each individual chromosome
in every generation in order to guide the evolution. Therefore, we must carefully
design a pertinent fitness function and use it to generate the fitness. We can apply
IGA to solve problems that have difficulties with designing a fitness function. In this
case, the user serves as a fitness function and assigns fitness to the chromosomes.
Designing an IGA fitness assignment method is an interesting future research topic.
Conventional fitness assignment strategy asks users to assign fitness to every
individual chromosome of the population, known as the rating all strategy [7]. Other
assignment strategies, such as the bias strategy, the picking some up strategy [8], and
the distance-based bias strategy [9] have also addressed the need to improve the
performance of IGA.
Assigning fitness during evolutions places a heavy load on IGA users during the
interaction activities. This results in the primary fatigue problem in IGA. Although
many IGA-based systems have been proposed for real world cases, the fatigue
problem still needs to be solved.
The select operations in GA are divided into three parts:
1. Calculating the fitness of each chromosome according to a pre-designed fitness
function.
2. Deciding which chromosomes should be selected into the mating pool according to
their fitness.
3. Selecting pairs of chromosomes for crossover and/or selecting a chromosome for
mutation.
In GA, a computer executes all three parts. In conventional IGA, a human does the
first part, while a computer executes the other two parts. Are there any strategies,
however, to manipulate the other two parts of the select operations to improve IGA
578 F.-C. Hsu and M.-H. Hung
buyers and announces the product lists on web pages; the negotiating sub-system
provides potential sellers communication channels with which to advertise the
designed products and to negotiate prices and delivery dates with buyers (or to bid for
target products).
The design toolkits contain an efficient IGA and it is expected that somewhere in
the world, there will exist at least one seller who has the producing competence to
fulfill a certain demand. We also can expect that at initial stages of the customizable
C2C framework, most of the demand might be met by hand, and later with computer-
aided manufacturer systems or other systems.
To improve IGA performance in customer designs, Hsu and Huang [10] proposed a
customer values-based IGA to solve the fatigue problem. To ensure that the search
space is a complete union set and is suitable for different customers, they followed
Keeney's value-focused thinking [11] to build a sufficiently large product space. The
results of a case study show the model is significantly helpful for improving IGA
performance in customer designs.
After that, Hung and Hsu [12] proposed an over-sampling-based IGA (OIGA),
which not only followed Keeneys value-focused thinking approach, but also allowed
IGA users to be involved in generating the first population. Since the over-sampling
strategy can ensure that a suitable proportion is prepared in the first generation, the
case study results show that the strategy performs as expected.
In this study, the OIGA model is used as a benchmark model for comparison with
the proposed model. The OIGA model procedure is shown below.
OIGA( )
{
Build a search space using Keeney's value-focused
thinking approach.
Generate a first generation of chromosomes with over-
sampling strategy.
Do
{
Phenotypes of the chromosomes are shown to the user.
The user assigns fitness to each chromosome.
IGA selects chromosomes into the mating pool according
to the fitness.
IGA generates new chromosomes of the next generation by
applying crossover and/or mutation.
} Continue until the user has found a satisfactory
chromosome or has reached other ending conditions.
580 F.-C. Hsu and M.-H. Hung
The IGA fitness assignment processes are time-consuming and tedious, but any
inaccurate rating causes IGA to select improper chromosomes into the mating pool,
resulting in user fatigue.
The actual purpose of the fitness function in GA is to calculate fitness values of the
chromosomes. Some fit chromosomes are then selected into a mating pool
accordingly. The real key is determining which chromosomes should be selected into
the mating pool.
The fitness function and the fitness value are both methods of selecting chromo-
somes into the mating pool. Allowing IGA users to directly select fit chromosomes
into the mating pool, however, is a very natural and effortless method. In other words,
select operations of IGA do not necessarily require fitness functions and fitness value.
We should allow IGA users to bypass the fitness assignment processes and let them
directly select fit chromosomes into the pool according to their hidden fitness
functions. Therefore, we propose a model that allows users to directly select
chromosomes from the nth generation into the mating pool to evolve the next generation
of chromosomes. The proposed model refers to SIGA, which introduces human
capabilities slightly more than simple IGA. The SIGA procedure is shown below.
SIGA( )
{
Build a search space using Keeney's value-focused
thinking approach.
Generate a first generation of chromosomes with over-
sampling strategy.
Do
{
Phenotypes of the chromosomes are shown to the user.
The user selects k (0<= k <= n, n = population size)
chromosomes into the mating pool.
IGA randomly generates (n-k) chromosomes into the pool.
IGA generates new chromosomes of the next generation by
applying crossover and/or mutation.
} Continue until the user has found a satisfactory
chromosome or has reached other ending conditions.
Mineral water bottle design was used as a case study to verify the performance of the
proposed model. A mineral water bottle has five attributes: cap, neck, label, body, and
Practically Applying IGA to Customers Designs on a Customizable C2C Framework 581
base; the cap itself has 8 attribute levels, while the other attributes take on 16 attribute
levels. The size of the solution space (bottles) is 219 (= 23 x 24 x 24 x 24 x 24 =
524,288). The chromosome structure, genotype, and the phenotype of the encoded
attribute levels are created by following the value-focused thinking approach. The
results are shown in Table 1.
Table 1. The chromosome structure, genotype, and phenotype of the mineral water bottles
We invited anyone who was interested in customers designs to take part in this water
bottle design experiment. Each subject was asked to design his or her preferred bottle
582 F.-C. Hsu and M.-H. Hung
from 524,288 candidate bottles using OIGA and SIGA systems separately. The
experiment randomly provided OIGA-based and SIGA-based systems. Fig. 2 shows
the SIGA-based system interface. We recorded 65 valid records at the end of the
experiment.
During the experiments, we recorded the number of generations every subject
actually used in both systems, and set those numbers as the efficiency indexes (Ei-OIGA,
i=165 and Ei-SIGA, i=165). At the end of the tests, both on OIGA and SIGA, subjects
were asked to choose one satisfactory bottle from the chromosomes of the last
generation, and rated the bottle on a 100-point scale for a satisfaction score. We used
these scores as the effective indexes (Fi-OIGA, i=165 and Fi-SIGA, i=165).
In the experiments, the population size was set to 8, one-point crossover was used,
one elitist bottle in each generation was preserved in the next generation, the cross-
over rate and mutation rate were 0.8 and 0.01 respectively, and the over-sampling rate
was 0.7 [10], [12 ].
5 Experimental Results
The results of the efficiency experiments are shown in Fig. 3. They show that no
matter which system the subjects used, most of them completed their designs within
25
20 OIGA
15 SIGA
Ei 10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Subjective No.
25
20
15
Ei
10
5
0
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Subjective No.
100
90
80
70
Ef 60
50 OIGA
40 SIGA
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Subjective No.
100
90
80
70
Ef 60
50
40
30
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Subjective No.
6 Concluding Remarks
In this paper, we propose a customizable C2C framework to fully utilize interactive
genetic algorithms and to discover the potential capabilities of IGAs in customer
584 F.-C. Hsu and M.-H. Hung
designs. The traditional interactive IGA methods are unnatural, especially when IGA is
applied to customers designs. In this study, we found that allowing IGA users to
directly select chromosomes into the mating pool is not only a natural way to implement
the select operations of IGA, but also more effective than the traditional method.
Freeing users from fatigue is the most important challenge in IGA. The OIGA
(Over-sampling IGA) model has shown its effectiveness at decreasing fatigue. We
conducted a case study and used the OIGA model as a benchmark. The results show
that the proposed SIGA model is significantly more effective than the OIGA model.
This means that users who design their product profiles with the benchmark system
will achieve the same satisfaction level as with the proposed system, but will use
more IGA generations. These extra generations will cause user fatigue. In other
words, the evidence shows that the proposed model performs better than the
benchmark system in terms of preventing fatigue.
To create an effective, applicable, and fatigue-free IGA, the interaction between
humans and computer systems in IGA must be extended. One possible extension is
allowing humans to directly interact with the genome, the genetic operators, or the
evolution processes; another possible extension is allowing humans to cooperate with
computational intelligence by guiding human-assigned fitness accurately or directing
IGA operations in the chromosome effectively.
Acknowledgement
Financial supports from Taiwan National Scientific Council (NSC 94-2416-H-156-
001- ) are gratefully acknowledged.
References
1. Takagi, H: Interactive Evolutionary Computation: Fusion of the Capabilities of EC
Optimization and Human Evaluation, in Proceedings of IEEE, Vol. 89, No. 9 (2001) 1275-
1296.
2. Urban, G. L. and von Hippel, E.: Lead User Analyses for the Development of New
Industrial Products, Management Science Vol. 34, No. 5 (1988) 569-582.
3. Thomke, S. and von Hippel, E.: Customers as Innovators - a New Way to Create Value,
Harvard Business Review, Vol. 80, April (2002) 74-81.
4. Dahan, E. and Hauser, J.R.: The Virtual Customer, Journal of Product Innovation
Management, Vol. 19, No. 5 (2001) 332-354.
5. Olivier, T., Hauser, J. R. and Simester, D. I.: Polyhedral Methods for Adaptive Choice-
Based Conjoint Analysis, Journal of Marketing Research, Vol. 41, No. 1 (2004) 116-131.
6. Olivier, T., Simester, D. I., Hauser, J. R., and Dahan. E.: Fast Polyhedral Adaptive
Conjoint Estimation, Marketing Science, Vol. 22, No. 3 (2003) 273-303
7. Caldwell, C. and Johnston, V. S. Tracking a Criminal Suspect through Face-Space with a
Genetic Algorithm, Proceedings of the Fourth International Conference on Genetic
Algorithms, Morgan Kaufmann, San Mateo, California (1991) 416-421.
8. Nishio, K., Murakami, M., Mizutani, E. and Honda, N.: Fuzzy Fitness Assignment in an
Interactive Genetic Algorithm for a Cartoon Face Search, in Sanchez, E., Shibata, T. and
Zadeh, I. A.(eds.): Genetic Algorithms and Fuzzy Logic Systems - Soft Computing
Perspectives, World Scientific Publishing (1997) 175-191.
Practically Applying IGA to Customers Designs on a Customizable C2C Framework 585
9. Hsu, F. C. and Chen, J. S.: A Study on Multi Criteria Decision Making Model: Interactive
Genetic Algorithms Approach, in Proceedings of the 1999 International Conference on
SMC, Tokyo, Japan (1999) 634-639.
10. Hsu, F. C. and Huang, P.: Providing an appropriate search space to solve the fatigue
problem in interactive evolutionary computations, New Generation Computing, Vol. 23,
No.2 (2005) 114-126.
11. Keeney, R. L.: Value-Focused Thinking: A path to Creative Decision-Making, Harvard
University Press, Cambridge, Massachusetts (1992).
12. Hung, M. H. and Hsu, F. C.: Accelerating Interactive Evolutionary Computation
Convergence Pace by Using Over-sampling Strategy, in The Fourth IEEE International
Workshop on Soft Computing as Trans-disciplinary Science and Technology, May 25-27,
Muroran, Japan (2005).
Evaluation of Sequential, Multi-objective, and Parallel
Interactive Genetic Algorithms
for Multi-objective Floor Plan Optimisation
1 Introduction
Interactive Evolutionary Computation (IEC) is an EC that optimizes a target system
based on human subjective evaluation, and the human plays the role of a fitness function
of conventional EC [1]. When it is applied to fields that include a degree of subjectivity,
such as engineering design, arts creation, music composition, or architecture, interaction
with a human evaluator helps the EC to generate solutions that incorporate his/her
expertise, or intuition without its explicit description into the optimisation platform.
Interaction between an IEC user and EC can proceed in many ways depending on task
domain. For instance, the user may participate in choosing elite designs for survival,
modify an individual and reinsert it into the population of designs, freeze parts of the
design with the intention of reducing the search space dimensionality besides human
fitness evaluation in normal IEC. Therefore, Parmee redefined IEC broadly as the
system optimisation based on human-machine interaction [2].
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 586 598, 2006.
Springer-Verlag Berlin Heidelberg 2006
Evaluation of Sequential, Multi-objective, and Parallel IGAs 587
START
yes
i=n? User
termination? yes
no
no
Quantitative generation count i=i+1
Generation count i=i+1
END END
Fig. 1. Sequential IGA design optimization Fig. 2. Multi-objective IGA design optimization
Evaluation of Sequential, Multi-objective, and Parallel IGAs 589
This method treats the subjective and quantitative features of the design problem as
separate objectives to be optimized. For instance, the individuals created from one
qualitative run are fed into the following quantitative run as parent designs, which
ensures the starting point of the quantitative run to be subjectively optimized designs.
This is how the connection between subjective and quantitative criteria is ensured.
With this algorithm, the authors aimed to represent a typical design cycle in an
engineering design firm, where the design is thrown over the wall between the
marketing department, concerned with subjective aspects of the design, and the R&D
department, concerned with quantitative aspects of the design. Figure 1 shows the
flow of sequential IGA
Island model, where every processor runs an independent EC, using a separate sub-
population. The processors cooperate by regularly exchanging migrants (good
individuals). This technique model is suitable for clustering populations.
Diffusion model, where the individuals mate with other individuals only with the
local neighbourhood. This approach is particularly suitable for massively parallel
computers with a fast local intercommunication network.
The usage of parallelization on EMOO can be effective as the goal of EMOO is to
find a set of good solutions rather than a single optimum. Nevertheless there exists a
limited amount of literature on their usage in this field [9][10]. On the other hand, the
usage of parallelization techniques in IEC has not been reported.
For the purposes of our research, we hypothesize that the usage of parallelization
and the usage of interactivity together could be advantageous and natural for our
problem. Since we deal with multiple conflicting objectives these could be evolved with
separate populations with elite migrants exchanged between one another i.e. following
the island model. The subjective objective fitness of a solution could be obtained by
user interaction and used to evolve one population while the other population is evolved
by a regular fitness function. It has been hypothesized that the obvious advantages of
this method would be (1) quantitatively evolved population could be evolved much
faster, utilizing the time by the human-computer interaction, (2) a compromise decision
can be encouraged by the migration of elites between populations.
The features of the parallel IGA include parallelisation technique, immigrant
selection, replacement strategy, and fitness assignment strategy to immigrants:
Parallelisation technique: the parallel IGA uses an island model and optimises n
separate populations with n separate objectives with immigrants exchanged
among them. In our experiments, we use n = 2.
Immigrant selection: the top three elite solutions are selected from each population to
be migrated.
Replacement strategy: the worst three individual solutions from each population are
replaced with the immigrants
Fitness assignment strategy to immigrants:
In the population optimised using the quantitative objective, immigrants are sorted
with respect to IGA rating. If any two ratings are equal, sorting is done using the
calculated quantitative objective fitness. After sorting, an arbitrary quantitative
fitness is assigned to the immigrants to ensure their survival. The arbitrary fitness
assignment proceeds with dividing the range of fitness obtained into five main
regions. Then, the mid, top, and bottom fitness levels of the top fitness range (i.e.
the minimum fitness interval as the problem is modelled as a minimisation
problem) is assigned to the three immigrants.
In the population optimised using the quantitative objective, immigrants are all
given the minimum, i.e. the best IGA rating. The reason is that the subjective
fitness rating is a discrete value and designs taking the same rating value are
allowed. However, there is very little probability that any two designs would give
the same quantitative fitness evaluation. Even though two designs may differ from
each other, the user might give the same rating in contrast to the quantitative
fitness evaluation. The parallel IGA fitness assignment strategy therefore is
represents this phenomenon.
Evaluation of Sequential, Multi-objective, and Parallel IGAs 591
Two parallel IGA platforms are used for testing: the pseudo parallel IGA that uses
a fitness function as a pseudo IGA user and the real parallel IGA that is a normal IGA
with a real human user. The user evaluation of the pseudo parallel IGA is simulated
by a fitness function and no real user involvement is included. The pseudo parallel
IGA is conducted to evaluate its maximum performance under the ideal condition
without human fluctuation in his/her evaluation. Figure 3 shows the flow of the
general parallel IGA procedure. Pseudo parallel IGA is coded with the C language
whereas real parallel IGA is written with C++, using the Microsoft Foundation Class
Library (MFC), Open Graphics Library (OpenGL) and Coin3D Library1.
START Initialize
Population
Initialize
Population Qn Read old population
population archive
Generate new archive
population
Calculate
generation stats
Exchange new and
old population
Migrate elites
from QlGA
Calculate
generation stats
Generate new
population
Select elites and Ql Elites
write to file
Exchange new and
old population
Invoke QnGA
Calculate
generation stats
Migrate elites Qn Elites
from QnGA
Report stats
Report stats
Generate new
population
Generate new
population
Exchange new and
old population
Exchange new and
old population
yes
Number of Select elites and
generations write to file
User > 20
termination no
no yes
END
1
Please refer to http://msdn.microsoft.com, www.opengl.org, and www.coind3d.org for more
information on these graphics libraries.
592 A.M. Brintrup, H. Takagi, and J. Ramsden
The house floor planning design is used as a benchmark task. The objectives are to
find the width and length of each room as shown on Figure 4(a). These parameters
will (1) minimize the cost of build which directly relates to the total area of the floor
plan, and (2) maximize subjective user evaluation received by the interactive fitness
evaluation element. This problem is an ideal candidate for testing the developed
algorithms as it includes both quantitative and subjective features. Minimization of
cost clearly constitutes a quantitative feature. On the other hand, the arrangement and
sizes of the rooms vary from person to person and can be easily subjectively
evaluated. Table 1 shows the parameters of the problem and how the overall area is
Bathroom
Bedroom 2
Living
Room
length
width
(a) (b)
Fig. 4. (a) Dimensional variability in the floor plan design, (b) Graphical user interface of the
modelled floor plan design problem
Parameter Area
Room Parameter
Label
Living room width X0 A1=X0 (2.2 X1)
Kitchen length X1
A2=2 X1 X2
Kitchen width X2
Bedroom 1 width X3
A3=X3 X4
Bedroom 1 length X4
Bathroom length X5 A4=2*[3.6 (X0+ X6)]* X5
Bedroom 2 width X6
A5=X6 X7
Bedroom 2 length X7
Bedroom 3 - - A6=X1*[3.6 (X2+ X3)]
A7= [3.6 (X0 +X6)]*[2.2 (X1+ X5)] + X6*[2.2
Hall - -
(X1+ X7)]
Evaluation of Sequential, Multi-objective, and Parallel IGAs 593
deduced from these parameters. Detailed description of the problem can be found in
[3]. Figure 4(b) is an example of the user interface used for our experiments.
The users are asked to evaluate the design by paying special attention to the sizes
of the bathroom and kitchen. The greater the sizes of these rooms, the greater the user
satisfaction should be. The reasons for this included: (1) ensuring the subjective and
quantitative objectives to be conflicting such that a pareto front could be reached and
analysed, (2) ensuring consistency between pseudo user evaluation as obtained from the
pseudo parallel IGA and the real user evaluation in the subjective components of the
rest of algorithms developed. Table 2 shows the fitness evaluation method for subjective
and quantitative objectives in the relevant components of the three different algorithms.
Table 2. Fitness evaluation in quantitative and subjective components of the parallel IGA,
multi-objective IGA, and sequential IGA
Description Component
Real parallel IGA subjective island
Real human A 10-scale subjective rating is taken Multi-objective IGA subjective
evaluation from the user. component
Sequential IGA subjective component
Pseudo parallel IGA quantitative island
This function minimizes the total
7 Real parallel IGA quantitative island
A
i =1
i
area of the rooms thus reduces cost
which is directly proportional to the
Multi-objective IGA quantitative
component
total area of the floor plan.
Sequential IGA quantitative component
1 This function simulates a user whose
requirement is bigger bathroom and Pseudo parallel IGA subjective island
A2 + A4 kitchen areas.
For parallel IGA, the number of generations and the population size for each island
is given in Table 3.
Table 3. Parallel IGA parameters for quantitative and subjective population islands
4 Results
Sequential IGA, multi-objective, and IGA Parallel IGA are compared over 55, 5, and
55 generations, respectively, in terms of overall average subjective fitness, overall
average quantitative fitness, average subjective fitness of last generation, and average
quantitative fitness of last generation. The Wilcoxon signed rank test, which is a
nonparametric pair observation test, is used to compare the results from these three
algorithms.
Figure 5 shows the average fitness values obtained at each generation of each run.
In the sequential IGA the qualitative average fitness showed a worsening trend in the
five runs pursued, while the quantitative results showed a result that was smoothly
improving. In the multi-objective IGA, parallel values were obtained as opposed to
sequential in sequential IGA. The qualitative and quantitative fitness averages both
showed an improving trend and the convergence to Pareto front was observed. As the
number of generations increase the solutions are minimized with respect to both
criteria. For the parallel IGA, the Wilcoxon signed rank test indicates that:
The pseudo parallel IGA is significantly advantageous over both sequential IGA
and multi-objective IGA in terms of overall average subjective fitness, overall
average quantitative fitness, average subjective fitness of last generation, and
average quantitative fitness of last generation, with a risk of 0.05.
The real parallel IGA is significantly advantageous over the multi-objective IGA
in terms of overall average subjective fitness, overall average quantitative fitness,
and average quantitative fitness of last generation, with a risk of 0.05. The
significance of average subjective fitness of last generation could not be
concluded with the number of tests performed.
The real parallel IGA is significantly advantageous over the sequential IGA in
terms of overall average subjective fitness, overall average subjective fitness, and
average quantitative fitness of last generation, with a risk of 0.05. The
significance of average quantitative fitness of last generation could not be
concluded with the number of tests performed.
Evaluation of Sequential, Multi-objective, and Parallel IGAs 595
We could conclude that the parallel IGA then provides significantly better results
in terms of multi-objective IGA and sequential IGA under our experimental
conditions although no significance was concluded for average last generation
subjective and quantitative fitnesses. Figure 5 shows the change in fitness in
subjective and quantitative fitness values in the three algorithms. The following
section discusses these findings.
10
Pseudo Parallel IGA
A verage q u an titative fitn ess
Sequential IGA
9
Multi-objective IGA
Real Parallel IGA
8
6
0 10 20 30 40 50 60 70 80 90 100
Number of Generations
(a)
14
Sequential IGA
12
10 Multi-objective IGA
8 Real Parallel IGA
6
4
2
0
0 1 2 3 4 5 6 7 8 9
Number of Gene rations
(b)
Fig. 5. Change of average (a) quantitative and (b) qualitative fitness in sequential IGA, multi-
objective IGA, pseudo IGA, and real parallel IGA
5 Discussions
In multi-objective problems, it is important that a set of good solutions that are
diverse from each other are obtained, so that compromise decision making can be
implemented. This section discusses the results in terms of fitness convergence and
diversity of results, after laying out the main sources of error.
(A) Sources of Error
Pseudo parallel IGA showed significantly better fitness convergence in both
subjective and quantitative objective values than the multi-objective IGA and the
596 A.M. Brintrup, H. Takagi, and J. Ramsden
sequential IGA. However, the multi-objective IGA and the sequential IGA involved
real human evaluation whereas parallel IGA did not. In order to minimise the
tendency for this error, during user evaluation in the multi-objective IGA and the
sequential IGA, users were asked to evaluate designs by observing the sizes of the
kitchen and bathroom, hence to resemble the fitness function in parallel IGA.
However, noise due to human inconsistency is inevitable and should be taken into
account in the assessment of results.
Another point to bear in mind while assessing these results is the number of
generations pursued with each algorithm. The multi-objective IGA pursues one
quantitative and one subjective generation together at a time since the evolution is
simultaneous. As the user must be involved in every generation and his/her fatigue
must be considered, only 5 quantitative generations could be evolved as opposed to
55 in the sequential IGA and the parallel IGA. We can observe that even with only
five generations, the performance of multi-objective IGA in both objective values was
comparable to that of the parallel IGA and the sequential IGA.
(B) Fitness Convergence
The multi-objective IGA and parallel IGA reached satisfactory designs in a shorter
time of 4 to 5 generations, whereas the sequential IGA failed to reach an equally
satisfactory design in the qualitative and quantitative objective spaces in a total of 45
generations. Although the final fitness scores of the quantitative objectives were
better in the sequential IGA and the parallel IGA than the multi-objective IGA since
the subjective objective was of equal importance to the design overall, the sequential
IGA performed significantly worse than other two IGA's. Rather than reaching a
compromise decision of equal qualitative and quantitative factors, the sequential IGA
competed the two objectives against each other; resetting and trying to recover from
the affects of the opposite objective each time. On the other hand, no significant
difference was found between averages of parallel IGA and multi-objective IGA at
the final generation although the overall fitness average of parallel IGA is better than
that of multi-objective IGA.
(C) Diversity
After the second quantitative run of the sequential IGA, the quantitative objective has
taken over providing designs with little or no difference for the qualitative evaluation
by the user. The users reported difficulty in distinguishing the designs, even though
minor differences still existed, but were difficult to be visualized. This led the users
to give similar ratings to designs, and it became difficult for the algorithm to
diversify designs. On the other hand, the diversity preservation mechanism with the
help of the crowding distance calculation in the multi-objective IGA has provided
results that were visually distinct from each other. Although no quantification of
diversity was performed, the diversity level in decreasing order was with multi-
objective IGA, parallel IGA, and sequential IGA.
6 Conclusions
This paper compared three IGA algorithms, namely parallel IGA, multi-objective
IGA and sequential IGA that are developed to optimise conflicting subjective and
Evaluation of Sequential, Multi-objective, and Parallel IGAs 597
quantitative multi-objectives. The algorithms are evaluated with the floor planning
design problem.
The major advantage of parallel IGA is its flexibility to accommodate more than
one population. The population size of multi-objective IGA has to be constant as
multiple objectives are dealt with simultaneously on each design, while that of the
parallel IGA is adjustable according to human limitations in the subjectively evolved
designs and according to the potential of the computer in the quantitatively evolved
designs. So the time spent by human during design evaluation is utilised and as more
number of generations can be evolved in the quantitative objective, we can get a
better fitness in this objective space. As there exist different populations, the parallel
IGA does not engage in a fight between the two contradictory objectives, as was the
case of sequential IGA. Thus we can conclude that both the multi-objective IGA and
the parallel IGA are significantly better than the sequential IGA and that sequential
optimisation did not give satisfactory results in dealing with multiple objectives in
conflict. The parallel IGA is observed to be satisfactory for incorporating multiple
criteria principles even though in itself the algorithm is not a multi-objective
optimisation algorithm. In dealing with subjective and quantitative conflicting design
objectives, both multi-objective IGA and the parallel IGA seem to be promising
approaches.
Although the quantitative objective remains implicit to the multi-objective IGA,
the parallel IGA displays the user the designs that emigrated from the quantitative
objective island to the subjective objective island. Therefore, in addition to the above
advantage, real parallel IGA can help promote innovative decision making by making
the user observe computer generated results and reconsider the evaluation given.
Future work will focus on investigating parallel IGAs performance with other
multi-objective optimisation algorithms and with different benchmark problems.
Additionally, a pair-wise preference experiment is to be undertaken for the final
populations achieved from the two algorithms.
Acknowledgements
This work was supported in part by 2005 Summer Program of Japan Society for the
Promotion of Science.
References
1. Takagi, H.: Interactive evolutionary computation: fusion of the capacities of EC
optimization and human evaluation, Proceedings of the IEEE, 89(9) (2001) 1275-1296.
2. Parmee I. C.: Poor-definition, uncertainty and human factors - a case for interactive
evolutionary problem reformulation?, Genetic Evolutionary Computing Conference
(GECCO2003), 3rd IEC Workshop, Chicago, USA (July 2003)
3. Brintrup A., Ramsden J., and Tiwari A.: Integrated qualitativeness in design by multi-
objective optimization and interactive evolutionary computation, IEEE Congress on
Evolutionary Computation (CEC2005), Edinburgh, UK (September 2005), 2154-2160
4. Parmee I. C., Cvetkovic D., Watson A., and Bonham C.: Multi-objective satisfaction
within an interactive evolutionary design environment, Journal of Evolutionary
Computation, MIT Press, 8(2) (2000) 197 222.
598 A.M. Brintrup, H. Takagi, and J. Ramsden
5. Kamalian R., Takagi H., and Agogino A.: Optimized design of MEMS by evolutionary
multi-objective optimization with interactive evolutionary computation, Genetic and
Evolutionary Computation Conference (GECCO2004), Seattle, USA (2004) 1030-1041.
6. Brintrup A., Tiwari A., and Gao J.: Handling qualitativeness in evolutionary multiple
objective engineering design optimization, Enformatica, 1 (2004) 236-240.
7. Deb K., Agrawal S., Pratap A., and Meyarivan T.: A fast elitist non dominated sorting
genetic algorithm for multi objective optimization: NSGA-2, Proc. Parallel Problem
Solving from Nature (PPSN2000), Paris, France (2000) 858-862.
8. Cantu-Paz E.: Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic
Publishers, Norwell, USA (2000).
9. Van Veldhuizen D. A., Zydallis J. B., and Lamont G. B.: Considerations in engineering
parallel multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary
Computation, 7(2) (2003) 144173.
10. Hiroyasu T., Miki M., and Watanabe S.: The new model of parallel genetic algorithm in
multiobjective optimization problems -divided range multi-objective genetic algorithms.
IEEE Congress on Evolutionary Computation (CEC2000), La Jolla, CA, USA (July 2000)
333340.
Supervised Genetic Search for Parameter
Selection in Painterly Rendering
John P. Collomosse
1 Introduction
Techniques for processing photographs into artwork have received considerable
attention in recent years and comprise a rapidly developing branch of computer
graphics known as image based non-photorealistic rendering (NPR). Perhaps the
most well studied NPR problem is that of painterly rendering; the automated
stylisation of imagery to give a hand-painted appearance. A number of these
algorithms now exist capable of emulating a variety of styles, such as water-
colour [1] and oil [2, 3, 4, 5, 6, 7].
Although the output of contemporary painterly rendering algorithms is often
of high aesthetic quality, the usability of such algorithms is impeded by the
plethora of low-level parameters that must be set in order to produce acceptable
output. Many of these parameters are data dependent, for example determining
the scale of image features to paint [3, 4]. The presence and ne-tuning of such
parameters is necessary to retain generality of the algorithm, but can be dicult
for the non-expert user to achieve. Furthermore, some algorithms [3, 6] seek
to emulate a broad range of artistic styles using additional user congurable
parameters. For example, [3] varies brush size, colour jitter, and stroke length to
interpolate between pseudo expressionist and pointillist styles. Often these
parameters can be time consuming to set both due to their number, and due
to their low-level nature, which can make them non-intuitive for inexperienced
users to manipulate when aiming for a conceptually higher level eect (e.g. a
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 599610, 2006.
c Springer-Verlag Berlin Heidelberg 2006
600 J.P. Collomosse
The majority of image based painterly rendering algorithms adopt the stroke-
based rendering paradigm, generating paintings by compositing ordered lists of
virtual brush strokes on a 2D virtual canvas. The development of such algo-
rithms arguably began to gain momentum with Haeberlis semi-automatic im-
pressionist painting system [13]. In Haeberlis system, stroke attributes such as
colour or orientation were sampled from a source photograph whilst stroke size,
shape and compositing order was set interactively by the user. Litwinowicz [2]
was the rst to propose a fully automated 2D painting algorithm, again focus-
ing upon the impressionist style. Paintings were synthesised by aligning small
rectangular strokes to Sobel edge gradients in the image and stochastically per-
turbing the colour of those strokes. Hertzmann later proposed a coarse to ne
approach to painting using curved -spline strokes . Spline control points were
Supervised Genetic Search for Parameter Selection in Painterly Rendering 601
In this section we briey describe our fast multi-resolution technique for stylising
photographs at interactive speeds. The scope of this paper is such that we have
focused on the parameterisation of the algorithm and interested readers are
directed to [12] for a more detailed description. The algorithm accepts a 2D
photograph as input, and outputs a 2D painterly rendering of that photograph
the visual style of which is a function of eight user-congurable scalar parameters,
p1..8 (Figure 1) which are output by the evolutionary search step (Section 3).
We begin by creating a colour band-pass pyramid segmentation of the source
image by applying the EDISON [18] algorithm at several spatial scales. This
segmentation is computed only once, during system initialisation, to facilitate
real-time interaction during the search process. To produce a painting, the pyra-
mid layers are rendered in coarse to ne order. Segmented regions within each
layer are painted using a combination of interior and boundary strokes; as
602 J.P. Collomosse
Param Description
p1 Colour jitter
p2 Maximum hop angle
p3 Region turbulence
p4 Colour (pleasure)
p5 Colour (arousal)
p6 Stroke jaggedness
p7 Stroke undulation
p8 Region dampening
Fig. 1. Summary of the eight rendering parameters p1..8 used to control visual style in
our painting algorithm
we explain in the next subsection. For each layer, the interior strokes of all
regions are rst rendered, followed by the boundary strokes of all regions.
colour of a new control point diers signicantly from the mean colour of those
already present in the stroke. Stroke thickness and colour are set as with the
interior stroke placement process.
Region Turbulence. Flat expanses within paintings, for example sky or water,
may be depicted in a variety of artistic styles. Our system encompasses a gamut
of rendering styles ranging from the calm, serene washes of a watercolour to the
energetic swirls of a Van Gogh oil or the chaotic strokes of a Turner seascape. We
introduce similar eects by repeatedly performing boundary stroke placement
(Section 2.1) subjecting region masks to morphological erosion prior to each
iteration. The number of iterations is proportional to rendering parameter p3 .
This has the eect of allowing boundary strokes to grow into the interiors of
regions in an unstructured manner, so breaking up at expanses in the painting.
Arousal (a)
Arousal
T1(min(|p|,|a|))
Surprise Jubilant
Afraid
Angry Elated U(|a||p|)
L(p)
Stress Excitement
D(|p||a|) U(a)
Happy
Frustrated
Dis Pleasure
Displeasure Neutral Pleasure pleasure (p)
Sad
Content
D(|p||a|) G(a)
Despair
L(p)
Depression Calm G(|a||p|)
Fatigued Relaxed
T2(min(|p|,|a|))
Bored Sleepy
Sleep
Sleep
Functions T1 (x) and T2 (x), indicated in Figure 2, are two special cases that
encode hue variation consistent with aroused displeasure (anger) and apathetic
displeasure (depression). Hue is manipulated via an RGB space transformation
prior to saturation and luminance manipulations. In the former case T1 (x), pre-
dominantly red colours are reddened and green (associated with calm) is reduced
(in proportion to x). These eects combine with the saturation and luminance
transformations already present to produce the combination of aroused reds and
dismal darks that appear in psychological literature in association with anger.
In the latter case T2 (x) we increase the blue in proportion to x to generate
a monotonous shift into the blue spectrum, associated with sadness and calm.
Colours are also desaturated and darkened in accordance with transformations
already present in that quadrant of the space.
3.1 Initialisation
Our approach is to sparsely evaluate M (.) over a subset of the population, and
use this data to extrapolate the behaviour of M (.) over the entire population.
We have designed a simple user interface, allowing us to prompt for the tness
of a given individual drawn from the population (so obtaining a sparse domain
606 J.P. Collomosse
Bad Excellent
Rating bars
Painting
thumbnails
Fig. 3. Snapshot of the interactive evaluation screen. The user is presented with thumb-
nails of the highest ranking 9 paintings and asked to rate one by clicking with the mouse.
Depending on the horizontal location of the click within the thumbnail, a tness score
[-1,1] is assigned to the chosen rendering. This snapshot shows images from the rst
generation of paintings hence the diverse selections available.
sample of M (.)). The user is supplied with a graduated colour bar, and asked
to rate the aesthetics of the painting rendered from a given individual on a
continuous scale spanning red (bad), amber (neutral) and green (excellent) see
Figure 3. Manually evaluating the entire population on each iteration would be
impractical, and to reduce user load we request evaluation of only one individual
per generation. The user is presented with the nine ttest individuals from the
previous iteration, and asked to rate the individual that they feel most strongly
about. Note that in the rst iteration individuals are deemed to exhibit equal
tness (see equation 2) and so are chosen from the population at random.
We use a Gaussian splatting technique to encode the results of our sparse
user interactions, and transform these into a continuous estimate for M (.). Each
time a user evaluates an individual we obtain a point q and a user tness rating
U (q) = [1, 1]. These data are encoded by adding a Gaussian to a cumulative
model, built up over successive user evaluations. Each Gaussian distribution is
centred at point q, and multiplied by the factor U (q). We assume the integral
under the Gaussian to be well approximated by unity in space 8 [0, 1], and
so infer the continuous function M (.) as:
Supervised Genetic Search for Parameter Selection in Painterly Rendering 607
0 if N = 0,
M (p) = 0.5 + N (2)
1
2N i=1 U (q i
)G(p, q i
, ) otherwise
Selection and Propagation. Once the current population has been evaluated,
pairs of individuals are selected and bred to produce the next generation of
painting solutions. Parent individuals are selected with replacement, using a
stochastic process biased toward tter individuals. A single ospring is produced
from two parents by way of stochastic cross-over and mutation operators. Each
of the eight parameters that comprise the genome of the ospring has an equal
chance of being drawn from either parent. Mutation is implemented by adding
a random normal variate to each of the eight parameters. These variates have
standard deviations of 0.1, i.e. 97% of mutations will produce less than 0.3
variation in a particular rendering parameter.
0.8
0.04
0.7
0.035
0.6
0.03
Std. dev fitness (diversity)
0.5
Mean fitness
0.025
0.4
0.02
0.3 0.015
0.2 0.01
0.1 0.005
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Generation Generation
of user time. Furthermore, our segmentation based painting algorithm did not
require technical parameters (e.g. scale or low-pass kernel size [3]) to be specied
explicitly by the user. Figures 5e1 , e2 give examples of further painterly styles,
contrasting two dierent stroke placement styles encompassed by our system.
The rst is reminiscent of the impressionist style paintings generated by [2],
the latter the impasto style oil paintings generated by [7]. Figure 5b gives a
further examples of the broad classes of image handled by our system. Figure 5c
demonstrates a single photograph (DRAGON) evolved into three distinct visual
styles. A non-expert was asked to use our system to create paintings depicting
high-level concepts such as anger (Figure 5c1 ), cheerfulness (Figure 5c2 ) and de-
spair (Figure 5c3 ). Graphs recording the mean population tness, and standard
deviation (diversity) during the evolution of these paintings are also supplied in
Figure 4. Convergence took between 20 to 25 generations before the termination
criteria was triggered. In Figure 4 we have forced evolution to continue beyond
30 generations, however improvements in mean tness beyond the automated
termination point are negligible.
To evaluate the usability of our system we developed an alternative low-level
interface using sliders to independently control p1..8 . Users were asked to pro-
duce identical renderings to those previously generated using our GA. Users were
usually able to reproduce results, but required ve or six minutes of experimen-
tation (and several hundred mouse clicks) before doing so approximately ve
times longer than when using our GA goal-based search.
When working with our system, we have found that users will often focus on
a particular aesthetic property of the painting and focus on the improvement of
that property. For example, users might address the issue of edge detail over,
say, the style of the in-lled background. Often these properties have no direct
mapping onto individual rendering parameters, providing some explanation of
the timing improvements of a top-down goal seeking approach to style selection
over a bottom up conguration of low-level painting parameters. The impact of
this behaviour can be observed in the graph of Figure 4 (left). Gradual increases
in painting tness are observed over time, interrupted by short-lived dips.
These dips become less pronounced as the generation count increases. We have
found the presence of dips to correlate with the issuing of negative ratings by
users; typically these are issued when a user has rened one aspect of the painting
to their liking, and begun to address a further aspect of the composition that
they had so far neglected. By neglecting renement of the latter aspect, a false
representation of the users tness function M (.) (see Section 3.2) has been
conveyed to the system and encoded in the Gaussian distribution model. This
requires user correction, often in the form of rating penalisation.
Throughout our work we have assumed the user has an ideal painting in
mind, and wishes to express instantiate that ideal through NPR. An alternative
application of our system might be in style exploration, where the user has no
well-developed goal state in mind. Early indications are that the guided search
provided by our system may be suitable for such activities. However if the user
Supervised Genetic Search for Parameter Selection in Painterly Rendering 609
a)
b)
c 1) c 2)
c 3)
d)
e 2)
e 1)
Fig. 5. A gallery of painterly renderings produced by our system, original images inset
substantially revises their aesthetic ideals late in the search, the reduced popu-
lation diversity can require tens of iterations before the user is able to explore
new regions of the problem space. If our system were to be used in this manner,
we would suggest increasing the standard deviation of the variates used during
mutation to maintain population diversity further into the search.
Acknowledgements
We are grateful for the assistance of users who participated in evaluation of the
GA interface, and also for the contribution of Maria Shugrina in development of
the colour model.
610 J.P. Collomosse
References
1. Curtis, C., Anderson, S., Seims, J., Fleischer, K., Salesin, D.H.: Computer-
generated watercolor. In: Proc. ACM SIGGRAPH. (1997) 421430
2. Litwinowicz, P.: Processing images and video for an impressionist eect. In: Proc.
ACM SIGGRAPH, Los Angeles, USA (1997) 407414
3. Hertzmann, A.: Painterly rendering with curved brush strokes of multiple sizes.
In: Proc. ACM SIGGRAPH. (1998) 453460
4. Shiraishi, M., Yamaguchi, Y.: An algorithm for automatic painterly rendering
based on local image approximation. In: Proc. ACM NPAR Sympos. (2000) 5358
5. Gooch, B., Coombe, G., Shirley, P.: Artistic vision: Painterly rendering using
computer vision techniques. In: Proc. ACM NPAR Sympos. (2002) 8390
6. Hays, J., Essa, I.: Image and video based painterly animation. In: Proc. ACM
NPAR Sympos. (2004) 113120
7. Collomosse, J.P., Hall, P.M.: Genetic paint: A search for salient paintings. In:
proc. EvoMUSART (at EuroGP), Springer LNCS. Volume 3449. (2005) 437447
8. Sims, K.: Articial evolution for computer graphics. In: Proc. ACM SIGGRAPH.
Volume 25. (1991) 319328
9. Ebner, M., Reinhardt, M., Albert, J.: Evolution of vertex and pixel shaders. In:
LNCS (in Proc. EuroGP05). Volume 3447., Springer-Verlag (2005) 261270
10. Draves, S.: The electric sheep screen-saver: A case study in aesthetic evolution. In:
LNCS (in Proc. EvoMUSART05). Volume 3449., Springer-Verlag (2005) 458467
11. Russell, J.A.: Reading emotion from and into faces: Resurrecting a dimensional-
contextual perspective. In Russel, J.A., Fernandez-Dols, J.M., eds.: The Psychology
of Facial Expression. Cambridge University Press (1997) 295320
12. Shugrina, M., Betke, M., Collomosse, J.P.: Empathic painting: Interactive styliza-
tion using observed emotional state. In: Proc. ACM NPAR Sympos. (2006)
13. Haeberli, P.: Paint by numbers: abstract image representations. In: Proc. ACM
SIGGRAPH. Volume 4. (1990) 207214
14. Hertzmann, A.: Paint by relaxation. In: Proc. Computer Graphics Intl. (CGI).
(2001) 4754
15. Treavett, S., Chen, M.: Statistical techniques for the automated synthesis of non-
photorealistic images. In: Proc. 15th Eurographics UK Conference. (1997) 201210
16. DeCarlo, D., Santella, A.: Abstracted painterly renderings using eye-tracking data.
In: Proc. ACM SIGGRAPH. (2002) 769776
17. Santella, A., DeCarlo, D.: Visual interest and NPR: an evaluation and manifesto.
In: Proc. ACM NPAR Sympos. (2004) 7178
18. Christoudias, C., Georgescu, B., Meer, P.: Synergism in low level vision. In: 16th
Intl. Conf. on Pattern Recognition. Volume 4. (2002) 150155
19. Kolliopoulos, A.: Image segmentation for stylized non-photorealistic rendering and
animation. Masters thesis, Univ. Toronto (2005)
20. Wright, B., Rainwater, L.: The meaning of colour. Journal of General Psychology
67 (1962)
21. Mahnke, F.: Color, Environment, and Human Response. Van Nostrand Reinhold
(1996)
22. de Jong, K.: Learning with genetic algorithms. Machine Learning 3 (1988) 121138
23. Holland, J.: Adaptation in Natural and Articial Systems. 1st edn. Univ. Michigan
Press (1975) ISBN: 0-472-08460-7.
24. Hertzmann, A., Perlin, K.: Painterly rendering for video and interaction. In: Proc.
ACM NPAR Sympos. (2000) 712
Robot Paintings Evolved Using Simulated
Robots
Gary Greeneld
1 Introduction
Open Problem #3 of McCormacks ve open problems in evolutionary music
and art (EMA) [1] requires one, To create EMA systems that produce art
recognized by humans for its artistic contribution (as opposed to any purely
technical fetish or fascination). The recent publicity garnered by the robot
paintings of Moura, Ramos, and Pereira that resulted from their ARTSBOT
(ARTistic Swarm roBOTS) Project might at rst glance be seen as a solu-
tion to McCormacks third open problem since the paintings are described on
the web (see http://alfa.ist.utl.pt/ cvrm/sta/vramos/Artsbot.html) as arti-
cial art, and in print as non-human art [2] or symbiotic art [3]. Note
that here the symbiosis is intended to be between human and robot. The site
http://www.lxxl.pt/artsbot/ where the images of the robot paintings with the
best resolution can be found also provides a Symbiotic Art Manifesto written
by Moura and Pereira.
It is unfortunate that some of the hyperbole associated with the ARTSBOT
project detracts from what is potentially a promising new development in evo-
lutionary art. At the center of the ARTSBOT Project lies an implementation
of a collective robotics art making system to create what are known as swarm
paintings. The ARTSBOT team reveals this by saying [4] to paraphrase and
polish slightly that the artworks are made by a swarm of autonomous robots,
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 611621, 2006.
c Springer-Verlag Berlin Heidelberg 2006
612 G. Greeneld
that live [by] avoiding simply [executing] streams [of commands] coming from
an external computer, [and] instead actually co-evolve within the canvas [en-
vironment]; acting [by] laying ink according to simple inner threshold stimu-
lus response functions, [while] simultaneously reacting to the chromatic stimu-
lus present in the canvas environment [left by other robots], as well as by the
distributed feedback, [that] aect[s] their future collective behavior. We note
that ARTSBOT was one of the few collective robotics entries in the most re-
cent international ArtBot art exhibition for robotic art and art-making robots
(see http://artbots.org/2004/participants/). Moura and Pereira claim that they
have created organisms that generate drawings without any intervention on their
part thereby creating a new kind of art based on a paradigm of non-human
autonomous entities that allow personal expression by human artists to be aban-
doned [3]. Perhaps this somewhat of an exaggeration. Since the controllers for
their robots were not evolved, ARTSBOT is not an evolutionary art system,
but rather an art making system consisting of human programmed autonomous
agents that reside and function in an articial ecosystem. There is a close con-
nection here between stigmergy [5] the situation where autonomous agents
alter their environment either accidentally or on purpose in such a way that
they inuence other agents to perform actions that achieve an objective such as
nest building and swarm painting. The principal dierence is that stigmergy
is usually associated with a clearly dened task or objective while swarm paint-
ing is usually associated with the more poorly dened objective of producing
aesthetic imagery.
While in our opinion the question of whether or not the ARTSBOT robot
paintings are more than what McCormack referred to as a technical fascination
has not yet been satisfactorily answered, what is most signicant to us is the fact
that ARTSBOT does not address McCormacks penultimate challenge, Open
Problem #5, which requires one: To create articial ecosystems where agents
create and recognize their own creativity. In this paper using simulated collective
robotics and taking for motivation the penultimate problem of how autonomous
robots engaged in making swarm paintings might eventually go about learning
to recognize their own creativity, as a rst step we investigate an evolutionary
framework that is designed to show how simulated robots might be able to
formulate ways to evaluate the aesthetic quality of their paintings. Unlike the
aesthetic evaluation system for agent produced art studied by Saunders and
Gero where each agent produced its own paintings and the evaluation model
was based on social dynamics [6], we consider an aesthetic evaluation system
where the collective agents are given shared access to a set of image evaluation
parameters which can then be used either individually or collectively to modify
the image making process. To help understand the consequences of our design,
we consider what eect dierent types of computations made using our set of
evaluation parameters have on our robot paintings.
This paper is organized as follows. In section two we provide some background
on the use of swarms and the non-interactive genetic algorithm for image making.
In section three we give the specications for our simulated robots. In section four
Robot Paintings Evolved Using Simulated Robots 613
2 Background
The notion of swarm paintings was rst introduced in an image processing paper
by Ramos that was contributed to an ant colony optimization (ACO) conference
[7]. Related work appeared in [8] and [9]. In this problem domain the use of
the interactive user-guided evolution paradigm that was originally proposed by
Sims [10] (i.e. the interactive genetic algorithm) was rst studied by Aupetit
et al [11]. They investigated an ant colony simulation where the virtual ants
deposited and followed color the scent while exploring a toroidal grid in
order to produce ant paintings. Greeneld [12] non-interactively evolved ant
paintings by evolving the genomes required for governing the behaviors of the
virtual ants using tness functions. His observation that only elementary tech-
niques were needed to measure ant exploration and ant cooperation capabilities
oers hope that relatively simple behavioral assessment parameters can be used
to help identify increased image complexity or well organized image compositions
in other evolutionary swarm painting scenarios. The use of the (non-interactive)
genetic algorithm in evolutionary art was rst considered by Baluja et al [13].
Using this technique for evolving two-dimensional imagery, interesting results
have been obtained by Greeneld [14] using co-evolution and the image gener-
ation method known as evolving expressions, by Machado and Cardosa [15]
using neural nets, and by Bentley [16] in order to identify cellular automata
patterns. In general, the question of how to evaluate aesthetics on the basis of
scientic principles and computational methodologies is a dicult one. To sam-
ple several dierent authors thoughts on the matter and help gauge the scope
of the debate see [17, 18, 19, 20, 21].
3 S-Robot Specification
The design of our simulated robots, or S-robots, is loosely based on a software
model for Khepera robot simulation by Harlan et al [22]. An S-robot is a virtual
circular entity with four binary valued proximity sensors together with a three-
channel color sensor. Three of the proximity sensors are located at the front of
the S-robot and the fourth is located at the rear. The forward and backward
sensors scan a eld 120 wide and the two side sensors scan a eld 45 wide in
such a way that there is a 15 overlap with the forward sensor. Thus the forward
facing eld of vision is from 90 to +90 up to a distance of twenty units
and the rear facing eld of vision is from 60 to 60 also up to a distance of
614 G. Greeneld
twenty units. Proximity sensors detect other robots and environmental obsta-
cles or boundaries but do not distinguish between the two. The color sensor is
mounted directly beneath center (rx , ry ) of the S-robot. The S-robots forward
direction is determined by the unit vector (dx , dy ). For all of the images shown
here, the robots two pens were operated synchronously so that either both were
up or both were down. The reason for this was so that when the S-robot was
mark making, the pen colors could be chosen so that that the mark had an
automatic built-in highlight. An S-robots painting mark is ve units wide. An
S-robot can swivel (i.e rotate in place) 10 clockwise or counterclockwise per
clock cycle and can move v units per clock cycle, 1 v 1, in either the
forward or backward direction in accordance with the sign of v. The S-robot
roams on an n m unit gridded world.
4 S-Robot Controllers
The onboard computer for an S-robot is an interrupt driven controller whose
job is to place a sequence of commands in an execution queue, sleep until the
queue is empty, and then plan and load the S-robots next sequence of commands
when it is awoken. An S-robot is autonomous because it can place commands in
the queue to request sensors readings so that when it is awoken it can perform
actions based on these sensor values. The controller loads commands of the form
<mnemonic> <argument> where the mnemonic is chosen from the list:
MOV Move
SWI Swivel
SPD Set Speed
SNP Sense Proximity Vector
SNC Sense Color Vector
PUP Pen Up
PDN Pen Down
Only the MOV, SWI, and SPD commands actually make use of the argument,
in all other cases it is treated as a dummy argument. By having the controller
indicate how far it wants the S-robot to travel, or how many degrees it wants
the S-robot to swivel, the burden of timing shifts to the simulator itself. The
simulator calculates how many clock cycles these actions will take so that it
can manage the discrete event scheduling, synchronize the movements of all the
S-robots, detect collisions, and update the sensors accordingly.
While in the future we would like to evolve the controllers themselves, in
this paper we make use of two controllers that we wrote ourselves in order to
consider how the cooperation between two S-robots was aected by their initial
placement and direction headings. Each of our controllers has four pre-planned
painting sequences it can load into the queue. For ease of managing simulated
evolution and evaluating the results, at run time we made only one of the four
painting sequences available to each controller. The four sequences can produce
an elongated double hooked curve, a wedge, a segment of a spiral, and a zigzag
Robot Paintings Evolved Using Simulated Robots 615
Fig. 1. Two S-robots using dierent controllers and dierent painting motifs. Note that
one of the S-robots did most of its painting by leaving its pens down while executing
a back-up and swivel sequence following boundary collisions.
Fig. 2. Images of two S-robots painting with, and without, exhibiting robot interaction.
On the left, the S-robot painting the closed gure is oblivious to its companion, while on
the right it collides with its companion and gets bumped into a new painting trajectory.
motif. Figure 1 shows an early S-robot test painting made using two S-robots
where one used the double hooked curve to draw closed gures and the other
used the zigzag sequence as it tried to roam more freely over the canvas. The
latter left the pens down during a back-up obstacle avoidance sequence which
explains the appearance of the long curving trails.
We now describe our two controllers. Controller A always rst checks the for-
ward sensor. If it is clear, it queues the assigned painting command sequence
followed by commands to swivel, move a short distance away, and take a prox-
imity reading. If the forward sensor is set, but the backward sensor is clear,
it queues a back-up sequence followed by swivel sequence and again requests a
proximity reading. Otherwise, having concluded it is boxed in, it swivels and
616 G. Greeneld
tries to move only a short distance away before taking a new proximity reading.
Controller B, on the other hand, can be set up so that it uses the color channel
sensors to search either for areas of the canvas that have not yet been painted or
for areas that have been painted by one of its companions. Whenever it locates
pixels of the type it is searching for, it queues the assigned painting command
sequence followed by a swivel sequence, otherwise it swivels and moves a short
distance from its present location. In both cases it again queues a color read-
ing. Figure 2 shows what happens when an S-robot with an A controller that
is drawing a closed gure gets bumped o course when an S-robot with a B
controller that is trying to ll in unpainted canvas gets too close.
5 Evolutionary Framework
The S-robot paintings described below were all painted on 200 200 canvases.
The S-robots were permitted to paint for 150,000 clock cycles. The genome for
an individual S-robot is the vector (sx , sy , d) where (sx , sy ) is its initial position
and d is its initial true compass heading, 180 d < 180. For a collection, or
swarm, of N S-robots the genome g is the concatenation of the genomes of the
individual S-robots. Thus g is a vector with 3N components. The point mutation
operator applied to g displaces each component of g by a small amount, while the
crossover operator applied to genomes from two swarms implements the usual
uniform crossover operator for two vectors with the same number of components.
Our evolutionary framework uses a population of size P = 16. Some evolu-
tionary runs set the number of S-robots at N = 2 while others use N = 4. For
each of G = 30 generations, the painting made by the swarm of S-robots with
genome g is assigned tness Fg using one of the calculation methods described
below. Then the P/2 least t genomes are discarded and P/4 breeding pairs are
formed by cloning from the pool of P/2 survivors. Breeding within each pair is
performed using crossover. Finally all P genomes are subjected to point muta-
tion. Thus an evolutionary run considers G P = 30 16 = 480 S-robot paintings.
The painting associated with the most t genome is logged after every ve gen-
erations. Since point mutation is applied to every genome in the population at
the conclusion of every generation, the implicit genetic algorithm is non-elitest
and therefore the generation in which the most t genome will appear during
the course of a run cannot be predicted in advance.
Fig. 3. S-robot paintings where tness was determined by the two S-robots ability to
cover the canvas. The image on the left is the most t image in the initial randomly
generated population, the image on the right is the most t image after ten generations.
Fig. 4. Two S-robot paintings where image tness was determined using a linear combi-
nation of the S-robot behavioral assessment terms. The goal is to evolve a composition.
The image on the left is the most t image from the original population, the image on
the right is the most t image from the twentieth generation.
Fig. 5. S-robot paintings from the fteenth and thirtieth generations obtained from a
run that used a tness function that maximized the assessment terms np and nc , while
minimizing ns
Fig. 6. S-robot paintings from the tenth and twentieth generations obtained from a
run that used a tness function that maximized S-robot interaction by using a product
of the behavioral assessment terms nb and nc
Fig. 7. S-robot paintings from the fth and twentieth generations obtained from a run
that used a tness function with interaction terms to maximize both S-robot interaction
and canvas coverage
The previous section showed how the evolution of our S-robot paintings occurs
by using optimization to select the initial congurations of the individual S-
robot settings. This optimization treats the tness calculation as a computation
that assigns an aesthetic value to each painting by the swarm of S-robots that
created it. Even though it would be a very time consuming process, we feel
it is important to make the observation that this tness calculation could be
performed by the S-robots themselves, because we believe that any collection of
robots that is engaged in evaluating or recognizing their own creativity would
need to include some kind of aesthetic evaluation capability such as ours. Of
course, to fully implement the protocol that our S-robots would need to follow
in order to achieve this aesthetic evaluation goal, the functionality of the S-
robots would need to be enhanced so that they could exchange data with one
other, make use of a pseudo random number generator, and have their initial
position and heading correctly calibrated. Assuming this were done, an outline
of the protocol would be:
It should be clear that it would not be too dicult to design more sophisticated
protocols for robot genomes involving controller settings, planning algorithms,
or painting sequences in addition to the initial conguration data.
References
11. Aupetit, S., Bordeau, V., Slimane, M., Venturini, G. and Monmarche, N. (2003):
Interactive evolution of ant paintings. In McKay, B. et al (eds), 2003 Congress on
Evolutionary Computation Proceedings, IEEE Press, 13761383.
12. Greeneld, G. (2005): Evolutionary methods for ant colony paintings. In Roth-
lauf, F. et al (eds.) Applications of Evolutionary Computing, EvoWorkshops 2005
Proceedings, Springer-Verlag Lecture Notes in Computer Science, LNCS 3449, 478
487.
13. Baluja, S., Pomerleau, D. and Jochem, T. (1994): Towards automated articial
evolution for computer-generated images. In Connection Science 6 325354.
14. Greeneld, G. (2002): On the co-evolution of evolving expressions. In International
Journal of Computational Intelligence and Applications 2:1 1731.
15. Machado, P. and Cardoso, A. (1998): Computing aesthetics. In Oleiveira, F. (ed.)
Proceedings XIV-th Brazilian Symposium on Articial Intelligence SBIA98, Porto
Allegre, Brazil, Springer-Verlag, LNAI Series, New York, NY, 219229.
16. Bentley, K. (2002): Exploring aesthetic pattern formation. In Soddu, C. (ed.),
Proceedings of the Fifth International Conference of Generative Art (GA 2002),
Milan, Italy. (See http://www.generativeart.com/papersGA2002/20.pdf.)
17. Boden, M. (1994): Agents and creativity. In Communications of the ACM 37:7
117121.
18. Boden, M. (1996): The Philosophy of Articial Life. Oxford University Press, New
York, NY.
19. Dorin, A. (2001): Aesthetic tness and articial evolution for the selection of im-
agery from the mythical innite library. In Keleman, J. and Sosik, P. (eds.) Ad-
vances in Articial Life - ECAL 2001 Proceedings, Springer-Verlag, Berlin, LNAI
2159, 659668.
20. Ramachandran, V. and Hirstein, W. (1999): The science of art: a neurological
theory of aesthetic experience. In Journal of Consciousness Studies 6 1552.
21. Whitelaw, M. (2004): Metacreation: Art and Articial Life. MIT Press, Cambridge,
MA.
22. Harlan, R., Levine, D., and McClarigan, S. (2000): The Khepera robot and kRobot
class: a platform for introducing robotics in the undergraduate curriculum. Tech-
nical Report 4, Bonaventure Undergraduate Robotics Laboratory, St. Bonaventure
University, New York.
Consensual Paintings
Paulo Urbano
1 Introduction
The emphasis of this paper is on the design of micro-painters swarms, which are able
to create interesting patterns in artistic terms. There are already examples of collective
paintings inspired by social insects: L. Moura [1] has used a small group of robot-
painters inspired by ants behaviour, which move randomly in a limited space. Stimu-
lated by the local perception of the painting they may leave a trace with one of their
coloured pens. The painters rely on stigmergic interaction [2] in order to create cha-
otic patterns with some spots of the same colour. Colour has the pheromone role: a
spot dominated by a certain colour has the capacity to stimulate the painter-robot to
add some paint of the same colour. Monmarch et al. [3] have also designed groups of
painters inspired by ants pheromone behaviour. It is based on a competition between
ants: the virtual artists try to superimpose their colours on traces made by others,
creating a dynamic painting that is always changing. His painters have the capability
to sniff the painted colours on the environment and react appropriately. The socie-
ties are composed by a small number of individuals (less than 10). We [4] have made
experiments in swarm painting using also ideas of stigmergy, where painters are at-
tracted by the state of the tableau spots (presence or absence of ink).
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 622 632, 2006.
Springer-Verlag Berlin Heidelberg 2006
Consensual Paintings 623
Achieving consensus by a decentralised mechanism can be very useful for the co-
ordination of a population of autonomous individuals, where the figure of a leader
does not exist. It is the case of common lexicon emergence among a population of
agents. Kaplan [5] has studied the dynamics of consensus, in the case of research on
language, specifically the formation of a shared lexicon inside a population of
autonomous agents. In multi-agent AI systems, it can be crucial that agents agree on
certain social rules, in order to decrease conflicts among individuals and promote
cooperative behaviour. Shoham and Tennenholtz [6] have tested, experimentally and
analytically, convergence properties of several behaviours for the on-line emergence
of social conventions inside multi-agent systems.
In general, research on convention emergence assumes that all of the options in
confront are equivalent in quality terms, and any of them has the same probability to
be chosen by all individuals as a convention. In this case, what is important for coor-
dination and for behaviour simplification is that every one agrees on a rule (driving on
the right is a human example)the nature of the rule is not relevant.
Our goal is to make a population make consecutive collective decisions, producing
a kind of a random evolution of collective choices. The nature of choices and the
duration of each consensus have to be completely random. With this goal in mind, we
want to design behaviours that assure fast convergence, inside a population of parsi-
monious individuals, in situations of maximal competition (worst case). But, at the
same time, we want also the possibility of dissidence of one or more agents in situa-
tions of unanimity and that one of the dissidents imposes efficiently a new choice to
others. This way we will achieve a random evolution of consensus controlled through
an auto-organized mechanism.
We have tried, without success, to adapt the existing behaviours to dissidence and
consensus evolution. Thus, we introduced a behaviour that demonstrated better results
in what concern convergence velocity in convention emergence. This behaviour re-
vealed to be also suited to dissidence and to the random evolution of consensual
choices along time.
We thought that one natural application of this dynamic random choice process
would be the artistic world. So, we have created a group of micro-painters (Gau-
gants), which are able to move and paint inside a virtual canvas. The Gaugants are
able to adopt consensual decisions, with direct implications on their coordination,
creating complex artistic patterns.
In the next section (2) we describe and analyse the most known behaviours for
convention emergence, showing their limitations for the goal of cycles of breaking
and forming consensus; in section 3 we introduce a new behaviour based on the no-
tion of force and conflict interaction which is able to converge quickly in worst case
situations and it can be easily adapted to breaking and forming consensus ; in section
4 we describe in detail, this successful behaviour; in section 5 we apply the random
consensual sequence mechanisms to the artistic world. We will incorporate them in a
society of micro-painters (swarm-painters), the Gaugants, are able to create consen-
sual sequences around colour and we close the paper by presenting our conclusions.
All our Gaugants paintings were made in Starlogo [7].
624 P. Urbano
The interaction games are based on a series of dialogues involving a pair of agents. In
each dialogue, two of the society members are randomly chosen for interaction, the
hearing and speaking elements. These names were used in the context of language
games, and we maintain them in this paper although, hearing and speaking are used
here in a metaphorical sense in the course of a unilateral interaction. During an inter-
action, the hearing agent gets access to the speaking agent state, and can change his
option based on the speaking agent information. These games differ in the type of
agents behaviours.
Initial situation and behaviour evaluation. Speaking about performance analysis,
we are interested especially in the average convergence velocity and its variation with
the number of agents. The convergence velocity is the number of dialogues necessary
for reaching a global consensus, starting with options that are equally distributed
among agentsno option dominates in the initial population.
In the research of convention emergence, initial situations correspond to situations
of maximal competition. In the literature, two initial situations of maximal competi-
tion are considered. In the first one we have only two choices where each one is
adopted by 50% of the population, and in the second one we have a different choice
per individual.
2.2 Behaviours
We are going to describe and discuss the most important algorithms that were de-
signed in the course of convention emergence research
Simple imitation. In the imitation game, agents are defined just by the options they
use in order to name a particular object. During a dialogue, the speaking agent
indicates to the hearing agent the option it is currently using, and the latter adopts it
immediately. Starting with 2 or N options equally distributed in the population (N
agents), nothing directs the group towards convergence, as every option can increase
its influence with equal probability. In general, convergence is assured after an
important series of oscillations in a time quadratic with the number of agents.
Positive reinforcement with score. The strategy that agents use in this game is to
adopt the most diffused option they have seen in the population. In this behaviour,
positive reinforcement with score (PRS), players associate a score with each option.
An agent is defined by his own option and by a preference vector whose dimension is
Consensual Paintings 625
equal to the total number of options present among the population. Agents register on
this data structure the number of times they met a member of each option. During a
dialogue, the speaker chooses as its current option the one with highest score in its
preference vector (in case of a draw, one of the winners is chosen randomly) and
subsequently the hearing agent increases by one unit the corresponding counter.
Kaplan has studied the dynamics of convergence for both initial situations of maximal
competition. Every counter in the preference vector starts with 0 except for the option
initially adoptedthe respective counter starts with one unity. In this game,
convergence is quicker than simple imitation and the variation is n log n.
Positive reinforcement with a forgetting mechanism. Even before the work on
language games, Shoham and Tennenholz [6] have made a series of experiments on
the emergence of a consensus where the collective choice was a social rule. They
have compared experimentally a number of different individual behaviours, where
one of them was very similar to Kaplans PRS referred above. They introduced two
different forgetting mechanisms in PRS and one of them happened to improve the
social rule emergence efficiency, reducing the average number of interactions
necessary to form a collective choice over a number of simulations. The latter
forgetting mechanism consists in fact in possessing a small-term memory (length N)
instead of a counter, where only the last N choices seen during the last N interactions
are registered. The agent only adopts another option if this new one was seen more
times than his current option during the last N encounters. Agents began the games
with empty memories. In fact, the simple imitation game of Kaplan corresponds to a
full forgetting, the short-term memories of the agents have only one cell (they just
register the current encounters). We have reproduced the experiments of Shoham and
Tennenholtz for games up to 1000 players, for different memory lengths, and we
concluded that the most efficient size is 5 and 7 units for 2 and N (one for each
individual) initial options.
This way, the already known behaviours for convention emergence are unsuited for
our goals. Even in the case of only one dissident. We did not even treat other essential
problems: what happens if we have more than one stubborn agent at the same time,
adopting different options? And what is the appropriate moment to end the temporary
stubborn behaviour as we want a succession of several consensual choices?
In this successful behaviour, during encounters, the loosing agents imitate both the
force and the option of the winning adversaries. The new recruited agents will have
their force increased having more power to recruit. At the same time, equals reinforce
their force, independently of loosing or winning. This way, we add a positive rein-
forcement if they have the same options. Lets look at the behaviour in more detail.
If the speaking agent is stronger (or have identical force) than the hearing agent,
the latter will adopt the choice and the force of the former. If the speaking agent is
weaker than the hearing agent, this one will conserve both force and option. In case
they have the same option at the beginning of interaction, force is reinforced in one
unit, independently of loosing or winning. In sum, the stronger ones recruit weaker
agents (these will be at least as strong as the winners imitating their choices, and they
can even overpass them in case their options were the same) enlarging the influence
of their options.
35000
30000
25000
Encounters
buffer5(2)
20000 buffer7(N)
RPS(2)
RPS(N)
15000
force(2)
force(N)
10000
5000
0
100 200 300 400 500 600 700 800 900 1000
Players number
Fig. 1. The difference of performance of PRS, PRS with short-term memory and our behaviour:
imitate the stronger
Consensual Paintings 627
We have conducted a series of experiences [8] in order to test the efficiency con-
vergence until consensus in situations of maximum competition (equal initial force
and equally distributed choices). Our experiments showed convergence for popula-
tions up to 1000 elements, with 2 and N initial options (N = number of players)
equally distributed, where each players starts with force of 0 units. The velocity of
convergence is faster than in the PRS game with the forgetting mechanism. Figure 1
compares both games, in the case of 2 options and N options, plotting the average
number of encounters necessary for consensus on 1000 simulations for a population
varying from 100 to 1000 players.
Now we will answer the question: But when does an agent adopt the dissident behav-
iour? After consecutively seeing some reasonable number of agents with the same
628 P. Urbano
option. This number is called the dissidence threshold and is a local attribute. After
having seen at least a certain number of consecutive equals an agent will be a dissi-
dent with some probability, which is called the dissidence probability. The most im-
portant parameter is the dissidence threshold. In fact, consensus duration depends
very much on this parameter. This way, every agent should adopt the same dissidence
threshold, and it should also evolve in order for the emergence of different durations
of consensual periods.
Now we are ready to give in detail the final behaviour. Each agent possesses 4 attrib-
utes: choice, force, dissidence threshold (dt) and probability (dp). During a random
encounter two agents face each other and the speaking agent reveals his choice and
his force. The hearing agent updates his equals counter. If his counter is less or equal
to dt than he becomes a dissident with probability dp. To be a dissident is to increase
his force in 200 units and to change the option to another, chosen randomly; he also
chooses randomly a new dissidence threshold. After this stage the agent will fight in
his head with his partner. If he is not stronger than him, he will imitate both his force
and his choice, otherwise he conserves both. After the fight, comes the reinforcement:
if he had the same choice as his partner when they met than his force is reinforced
(one more unit), independently of the fight outcome.
We know that our behaviour is maximally efficient in situations of just one dissi-
dent. A bigger number of dissidents implies, generally, larger transition periods. The
exception is the case we have more than one dissident with the same choicethe
efficiency will obviously increase. In the worst case, every individual has a different
choice and we know the performance is good, even in the unlikely situation where
they have the same force (better than the standard behaviours, see fig. 1).
In each consensual period, every individual will have the same option and the same
dt. In each dissidence point, those two attribute values will change randomly, which
means that well have a random consensus evolution concerning the nature and dura-
tion of each consensus. Through the variation of both parameters, we can obtain a
large diversity of consensual periods and transition periods between two consensual
choices. We want, in general, short transition periods and variable consensus dura-
tions, constraining these two parameters dt e dp. Note that there are cases where con-
sensus is not even achieved-just think of a zero dt or even a very small value.
have a sequence of steps where in each time step every Gaugant executes its own
behaviour until the painting is finished.
Each Gaugant is very simple and has a very simple social behaviour. Each one has a
position (real Cartesian coordinates), an orientation (0..360), and can only inhabit one
cell, the one that corresponds to the round of their coordinates. They have a certain
speed, which is a global parameterspeed corresponds to the number of cells they
move forward in each step. On the other hand, the painters are created with a particu-
lar colour. They can interact unilaterally with any other painter independently of their
position in order to exchange information unless they are constrained by some radius
of perception. Our micro-painters will have incorporated the imitation and dissidence
behaviours we described above, based on force.
The micro-painter agents will establish a consensus around colour. Here is the general
behaviour:
1. If counter number is no less than the dissidence threshold, change to a dissident
with probability dp (if success go to 2 else to 3).
2. Turn into a dissident: mutate and increase force in 200 units; choose a random
dissidence threshold and reset the counter of equals.
3. Choose a random partner. If his partner is stronger then imitate him, otherwise
does not change. Gaugants always imitate the force and equals-threshold of
stronger agents, during interactionsit is essential for our consensual sequence
formation. They also imitate colour because this is the focus of consensus. But
they can imitate other attributes as well, that we find important for pattern crea-
tion.
4. If he had the same colour as his partner at the beginning of their encounter,
then increase force in 1 unit (reinforcement). Update the equals counter de-
pendent on the colour of his partner (1 unit more in case they were the same
and 0 in case they were different).
5. Try to paint his patch (only if it is not yet painted).
6. Executes its own non-interactive behaviour (normally related with movement).
Mutation involves always colour: the dissident changes to a different colour (nor-
mally randomly chosen) and he changes also the equals-threshold. However the dissi-
dent may also change other parameters.
5.3 Experiment 1
Besides position, orientation, colour, force and equals-threshold, each Gaugant has a
parameter called Rot In the beginning we divide the global population in a number of
groups and put each group with the same orientation and position. The agents choose
their colours in a random fashion. The same happens with the Rot parameter. There is
a global parameter MaxRot and each agent sets his Rot to a random value inside the
interval [0,MaxRot]. So, in the initial situation, inside a group, the painters have the
630 P. Urbano
same position and orientation, but different colours and Rots. The Gaugants only
imitate the colours of the others (Rot is fixed since the beginning). In this experiment
we consider that each painter may communicate with any other. The non-interactive
individual behaviour consists only in rotating Rot degrees and going forward a num-
ber of steps (speed). Speed is a global parameter and also not subject of imitation. The
dissident only changes his colour (mutation).
Fig. 2 shows three snapshots of a painting evolution along time (population of
2000 painters divide in groups of 20 groups 100 elements.
We can see that initially every group element is in the same patch but because they
do different rotations, the one-colour spots get larger and larger, and after a while,
every agent is dispersed in the tableau creating a confused background that highlights
the initial spots. The fact that any micro-painter can choose any other as a partner is
the responsible for having similar forms and colours in different parts of the tableau.
We can see also different consensual areas, implying different consensus durations
this is due to the change of the equals-threshold during dissidence.
5.4 Experiment 2
In the second experiment Gaugants will imitate both colour and orientation. The dis-
sident will change colour and orientation. Moving is just going forward a number of
speed units (global parameter) and rotating to the right a random number of units
(between 0 and Rot). MaxRot is 6 and each group imposes to theirs elements the same
position and orientation. Each element begins with a random colour and Rot is ran-
domly chosen between 0 and MaxRot.
In figure 4 we show the evolution of a painting made by a population of 2000 ele-
ments divided in 30 groups where everybody can choose any other as a partner to
interact. We can see clearly the sequence of consensus (specially due to colour, the
change of orientation was very light).
Looking at figure 5, showing a painting that we call The Swans we can see clearly
that orientation is changed during dissident behaviour.
Fig. 4. Three snapshots of a painting by 2000 agents that imitate both colour and orientation
References
1. Moura, L.: Swarm Paintings. Architopia: Art, Architecture, Science. (ed. Maubant) Institut
dArt Contemporaine (2002)
2. Grass, P.-P.: Termitologia, Tome II. Fondation des Socits. Construction. Paris: Mas-
son, (1984)
3. Aupetit, S., Bordeau, V., Monmarch, N., Slimane, Mohamed, Venturini, G. "Interactive
Evolution of Ant Paintings", in CEC03 - Congress on Evolutionary Computation, IEEE
Press, Canberra, Australia, (2003) 1376,1382
4. Urbano, P.: Playing in the Pheromone Playground: Experiences in Swarm Painting. In:
(proc. EvoMUSART05). Applications of Evolutionary Computing, EvoWorkshops 2005.
Lecture Notes in Computer Science. Volume 3449., Springer- Verlag (2005), 527-532
5. Kaplan, F.: L'emergence d'un lexique dans une population d'agents autonomes. PhD the-
sis, LIP6 Universite Paris VI, (2000).
6. Shoham, Y, Tennenholtz, M.: Emergent conventions in multi-agents systems: initial ex-
periments results and observations. In Proceedings of the 3rd International Conference on
Principles of Knowledge and Reasoning, (1992) 225-231.
7. Resnick, M.: Turtles, Termites and Traffic Jams: explorations in massively parallel mi-
croworlds. MIT Press (1994).
8. Urbano, P. (2004)Jogos Descentralizados de Consenso ou de Consenso em Consenso.
PhD Thesis, Universidade de Lisboa, (2004).
Using Physiological Signals to Evolve Art
Tristan Basa, Christian Anthony Go, Kil-Sang Yoo, and Won-Hyung Lee
1 Introduction
Even today, automation has only been eectively applied on systems which are
mechanical in nature. Even with a specialized eld like articial intelligence,
activities which involve human creativity continue to present a major challenge as
far as automation is concerned. Examples of these are systems requiring human
preference. In producing designs amiable to humans, a system must be able
to model what humans nd interesting. A system which is able to recognize the
interestingness of something will also be able to mimic curiosity. One major issue
in building such systems is that human preference tends to be subjective. While
one persons choice might not be the same as that of another, several theories
have been proposed which suggest a common psychological pattern involved in
the process of making those choices.
One approach that has been explored is the use of novelty to calculate
interest[1]. Novelty can be calculated using machine learning techniques such
as neural networks, but in this context, these techniques can be viewed as mere
approximations to the way the human brain recognizes novelty, and based on our
experiments, there are cases when interest is not necessarily a function of novelty.
On the other hand, studies have been done which suggest basic emotions hav-
ing physiological signatures[2]. These patterns can be used as hypotheses with
which to detect interest. This approach, intuitively, is a more direct measure of
interest.
Although the use of physiological signals has been used directly in producing
art[3], Our application involves determining interesting two-dimensional digital
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 633641, 2006.
c Springer-Verlag Berlin Heidelberg 2006
634 T. Basa et al.
Fig. 1. (a) Two parent trees that can be used to generate artworks. (b) Three possible
children of both parents.
Using Physiological Signals to Evolve Art 635
thetic criteria. Using genetic algorithms, we can form new formulas from the pre-
vious formulas via the crossover and the mutation operations. The crossover
operator exchanges a randomly chosen branch of one parent tree with a randomly
chosen branch of the other parent to generate children as illustrated in Fig. 1.
Fig. 2. An Optimal Separating Hyperplane divides the data into two classes
4 Methods
Human Electroencephalography (EEG) measures both the frequency and ampli-
tude of electrical activity generated from the brain. The growing use of EEG has
636 T. Basa et al.
enabled researchers to study regional brain activity and brain function, in partic-
ular, emotional activity. In this paper, we basically want to distinguish positive
emotions from negative ones. We use the basic emotions based on the Facial
Coding System (FACS)[7] as our positive and negative examples of emotions
because of their universality and assumed innate neural substrates. The FACS
basic emotions are classied as: Anger, Sadness, Happiness, Surprise, Disgust
and Fear.
Levenson[8] demonstrated that voluntary facial activity produced signicant
levels of subjective experience of a certain emotion. Autonomic distinctions were
identied between positive and negative emotions. This also extended to include
distinctions between some negative emotions. Recorded heart rate was also found
to dierentiate emotions.
Hubert and De-Jong [9] showed dierent electrodermal responses and heart
rates when subjects were exposed to lm stimuli to elicit a positive and nega-
tive response. A temporary decrease in heart rate was observed during negative
stimulation, while it remained unchanged during positive stimulation. Skin con-
ductance increased signicantly during negative stimulation and it recorded only
an initial increase during positive stimulation. Experiments by Collet demon-
strate that each basic emotion has a specic Autonomic Nervous System (ANS)
response pattern associated with it. Fifteen out of fteen emotion pairs were dis-
tinguished using combined electrodermal (skin resistance, skin conductance and
skin potential), thermo-circulatory (skin blood ow and skin temperature) and
respiratory parameters (instantaneous respiratory frequency). Emotions were re-
dundantly separated thus supporting the hypothesis of ANS specicity[2].
It has thus been proven that there exists specic Autonomic Nervous System
parameters which can be associated with each basic emotion. These parameters
will serve as our basis of the genetic algorithm for selecting which artwork to
evolve.
Due to the stochastic nature of EEG signals, patterns specic to an emotion
are not identical. As such, machine learning techniques can be employed to infer
models of human emotions from these signals.
Support Vector Machines have been shown to be excellent in inferring boolean
functions from a training set of positive and negative examples. In our exper-
imental setup, the positive examples refer to emotions such as happiness, ex-
citement, and surprise, and the negative examples to emotions such as sadness,
disappointment, and anger.
By means of using an image exposure technique, dierent emotional states
were elicited from the test subjects. The subjects consisted of 8 volunteers (4
male and 4 female; between 21 and 32 years of age). All of the subjects in this
experiment were graduate students whose visual art backgrounds range from
completely none to 3D graphic designers. They gave informed written consent to
the study and were not paid for their participation. All were in healthy condition
and did not take any prescribed medication.
An image pool of 72 images consisting of 36 positive (happy) and 36 negative
(sad/depressing) was compiled. These images have been selected at the discre-
Using Physiological Signals to Evolve Art 637
tion of the facilitator. Each participant was asked to select 2 images from the
pool that stimulated the strongest positive and negative response. Following an
initial resting state of two minutes to calibrate the EEG machine per participant,
subjects were shown their respective selected images. Each subject was shown
their respective image to elicit the corresponding emotion, followed by a 30 sec-
ond resting period in between emotions. The emotional mood states were highly
intensive and maintained for at least one minute. This is a period suciently
long for valid estimates of EEG activity to be measured.
Subjects were required to refrain from smoking and consuming caeine and
stimulants 2 hours immediately preceding the experiment to prevent irregular-
ities in ANS parameters. Absolute silence was observed during all experiments
to prevent signal artifacts.
The ANS parameters recorded were respiration, ECG and 8 channels per fre-
quency, Alpha, Theta and Beta. Alpha (Bergers wave) is the frequency range
from 8.5 Hz to 12 Hz. It is characteristic of a relaxed, alert state of consciousness
and is present by the age of two years. Beta is the frequency range above 12 Hz.
Low amplitude beta with multiple and varying frequencies is often associated
with active, busy or anxious thinking and active concentration. Theta is the fre-
quency range from 4.5 Hz to 8 Hz and is associated with drowsiness, childhood,
adolescence and young adulthood. This EEG frequency can sometimes be pro-
duced by hyperventilation. Theta waves can be seen during hypnagogic states
such as trances, hypnosis, deep daydreams, lucid dreaming and light sleep and
the preconscious state just upon waking, and just before falling asleep. It was
observed that the women in the group recorded a more pronounced dierence
between emotional states compared to the men. The following experimental re-
sults will demonstrate the eects of better training data with regards to emotion
detection.
The collected EEG test data were then classied into training models used for
the learning algorithms of Support Vector Machines. SVM were trained to recog-
nize positive and negative emotions using these EEG models. 16 sets of training
data were utilized, comprised of 2 sets of emotions (positive and negative) for
each of the 8 participants. Upon completion of SVM training, generalized tem-
plates of positive and negative emotions were created. These emotional templates
were the basis upon which the SVM would compare and classify EEG test data,
as either a positive emotion or a negative emotion (Fig. 3).
During the testing phase, each participant was asked to sit in a comfortable
armchair and connected to the EEG machine. The same restrictions and con-
trolled conditions applied to the test subjects, no stimulants, no smoking and
absolute silence during the experiment. Digital art images were presented in a
monitor in front of each subject for one minute while EEG signals were being
recorded. The sequence of presentation of the images was randomly selected. The
group of initial images all belonged to a single family, they were all evolved
from the same parents. Using SVM classier, the artwork that resulted in the
most positive classication was selected to be the image to be evolved. A negative
response to the artwork signals disinterest/boredom, prompting the image to be
638 T. Basa et al.
5 Results
In the testing phase, six subjects participated out of the eight who volunteered
in the training phase. SVM was able to classify eleven out of eighteen actual data
correctly from the six subjects. This translates into a 61% accuracy. Consistent
with the training data, it was also observed that the females in the group recorded
a higher percentage of correct emotions determined (55%) as compared to the
males (45%). This could be attributed to more consistent signals extracted from
female subjects. Sample images used in the experiment are shown in Figure 4.
Figure 5 shows the accuracy obtained by SVM classication. Figure 6 illustrates
the distribution of accuracy between the genders.
6 Conclusion
Previous basis of nding interesting designs have focused on novelty to evolve
art. Although novelty has been shown to coincide with human preference to some
accuracy, it can be considered indirect measures of what the human brain itself
nds interesting. This paper has presented a more direct approach of measuring
interest by using physiological signals which can be used as tness function to
evolve new designs.
Using Physiological Signals to Evolve Art 639
Fig. 4. Genetic art images on the left are the original images. Images on the right
corresponds to eight children produced from the parent. (a) An example where the
positive classication by SVM matched human preference, enclosed by dotted lines.
(b)An example wherein the positive classication by SVM, enclosed by dotted lines,
did not match the preference of a subject, enclosed by solid line.
Experimental results have shown that using machine learning techniques such
as Support Vector Machines, human preference in artworks can be generally in-
ferred. For our purposes, we have used this approach to evolve art, however, there
are many other facets wherein this technique can also be applied. Other poten-
tial areas of application can include the eld of medicine. One advantage born of
640 T. Basa et al.
this approach is the possibility of being able to communicate the preferences and
emotions of quadriplegic people. However, present methods of extracting phys-
iological signals are still considered cumbersome inasmuch as they still require
cumbersome machines.
Future directions of research can extend this approach by incorporating a pre-
processing step such as the Blind Source Separation (BSS) algorithm to minimize
noise in EEG signals which should improve the classication accuracy.
Acknowledgment
This research was supported by the Korea Culture and Content Agency, Seoul,
Korea, under the 2005 CT project and research fund of Chung-Ang University
in Seoul.
References
1. Saunders, R.: Curious Design Agents and Articial Creativity. Proceedings of the
4th conference on Creativity & cognition. Loughborough, UK. (2002) 80-87.
2. Collete, C., Vernet-Maury, E., Delhomme, G., Dittmar, A.: Autonomic Nervous
System Response Patterns Specicity to Basic Emotions. Journal of the Autonomic
Nervous System 62 (1997) 45-57
3. Mori, M.:Wave UFO:
http://www.publicartfund.org/pafweb/projects/03/mori release s03.html
4. Mitchell, T.: Machine Learning. McGraw-Hill Companies Inc. Singapore. 1997
5. Sims, K.: Articial Evolution for Computer Graphics: Computer Graphics (Sig-
graph 91 proceedings), Vol.25, No.4, July 1991, pp.319-328.
6. Christianini, N., Taylor, J.S.: An Introduction to Support Vector Machines and
Other Kernel-Based Learning Methods. Cambridge University Press. UK. 2000
7. Ekman, P. and Friesen, W.V.: The Facial Action Coding System. Consulting Psy-
chologists Press, Paolo Alto, 1978
8. Levenson, R.W., Ekman, P. and Friesen, W.V.: Voluntary facial action generates
emotions specic autonomous nervous system activity. Psychophysiol., 21, 1990
9. Hubert,W. and De Jong Meyer, R.: Psychophysiological response patterns to pos-
itive and negative lm stimuli. Biol. Psychol., 1990, 73-93
Using Physiological Signals to Evolve Art 641
10. Hinrich, H., Machleidt, W.: Basic Emotions Reected in EEG-coherences. Inter-
national Journal of Phsychophysiology 13 (1992) 225-232.
11. Fridlung, A.J., Schwartz, G.E. and Fowler, S.C.: Pattern Recognition of Self-
Reported Emotional state from multiple-site facial EMG activity during aective
imagery. Psyhophysiol., 21, 1984
Science of Networks and Music:
A New Approach on Musical Analysis and
Creation
1 Introduction
In 1998, an important paper [1] demonstrated that the connections between
peoples all over the world may be studied as a graph which is not completely
random or regular but an ordered lattice with a small quantity of disorder.
For the construction of this model Watts and Strogatz used a procedure called
rewiring, which consists of attaching to each edge of a regular lattice, of the
probability p that the edge be moved to another vertex. Such probability p,
comprised between 0 and 1, determines the randomness of the structure; a graph
with p = 0 is completely regular and a graph with p = 1 is entirely random, for
0 < p < 1 we obtain a network called small-world. This model is based upon two
parameters, path length (L) and clustering coecient (C). The rst parameter
measures the average separation between two vertices in a graph composed by
N vertices and E edges, the second one is the ratio between the edges present
in the neighborhood of a vertex i (which is composed of all the vertices directly
connected to the vertex i) and the maximum possible number of edges of the
neighborhood, that is given by the formula:
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 642651, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Science of Networks and Music 643
2 Ei
Ci = (2)
(Ni (Ni 1))
According to Latora et al. [10, 11, 12] every complex system can be seen as a
network where the single elements are represented by nodes, and the links show
644 G. Campolongo and S. Vena
the interaction between parts. Music may also be thought as a complex system
[13], with a strong interaction and emerging properties. Recently, network has
been adopted as a metaphor for algorithmic composition and for musical analysis.
In some cases [14] the interconnected musical networks have been used as
instruments to improve interaction in musical performances and as extension of
Cages work on indeterminacy [15]. Composers such as Weinberg use network as
a tool to connect musicians and performers producing brand new interactions
and very sophisticated compositional practices (that in some cases involve a
great amount of people at great distances). In other cases [16, 17, 18] network
becomes an instrument for graphical and analytical modeling of music and a
new paradigm for generative and evolutionary music [19, 20].
Every musical composition may be modeled as a network where each node
corresponds to a note and each link represents a connection between notes.
For instance, if a song starts with F and the second note is A# we will have
two nodes connected by a link and so on, till the end of the song. Network
becomes a sort of map of the composition, a catalogue of the paths performed
by the musician, represented as they were the streets of a city, or the trac of an
airport. All the connections in the map represent melodic intervals of the studied
song. This kind of representation is advantageous because it shows, in a single
image, the complex plot of interplay in a composition. The connections between
nodes clearly show how the notes are linked, so that it is easy to understand
how a melody is organized only by watching the graphical structure. We may
think network as an instrument for didactic use too. As shown in Fig. 1 a simple
picture can help a non expert listener or musician comprehend musical concepts.
1. Control Panel,
2. Net View,
3. Logger Window.
The Control Panel, situated at the left of the interface, is the area where all
the controls, buttons and information are placed. Through this area the user
may execute all the operations available in this software. The Net View, at
the center of the interface, is the part of the interface where the networks are
represented, while the logger window (placed under the net view window) shows
all the operations made by SWAP. We may divide the functionalities of SWAP
in two parts. The rst is representation and modeling function. Giving a MIDI
le as input, SWAP creates a representation of the composition as a network. All
numerical information on networks can be controlled by the Net Info window
placed in the control panel. This window shows all data extracted from the
composition, such as number of nodes, edges, average edges per node number,
clustering coecient, path length, and so on. SWAP also contains a window
called Node Info, which gives all the information about every single node of the
network. The user may select a node directly from the Net View with a mouse
click. The selected node will be automatically evidenced on the graph and all
the edges connected to this node will change color from white to uorescent
green, so all the notes directly connected to the selected one will be immediately
apparent. On the Node Info window will appear all the information on the node:
value (A, A#, B etc.), number of links and so on. Another important option is
the Mouse Explore function, which allows to rotate the network in the Net View
area and to visualize it from the best view point.
In the Control Panel are also placed all the controls for the construction of a
musical network, which can be created by the user by choosing the number of
nodes, links per node and the percentage of randomness in the links distribution.
A network (constructed with SWAP or loaded from a MIDI le) may be saved as
a text le which represents the network as a matrix. Each entry of this matrix
has two indices related to the nodes of the network, and its numerical value
represents the connections in the structure. So if we have 0 between two nodes
there is no link between them, on the contrary if the value is 1, we have an
edge that links the two nodes. This representation allows the user to dene
networks and to place each link in the structure nding the more appropriate
conguration. The second functionality of SWAP is called Synthesis, and allows
646 G. Campolongo and S. Vena
to create scales and melodies from the networks created or from those imported
from MIDI les. This function will be better described in another section of this
paper.
4 Graph-Based Analysis
The rst step of our work consisted of thorough analysis of the topology of
networks derived from musical compositions. The parameters analysed are the
ones studies by Watts and Strogatz in their paper, clustering coecient and path
length. We extracted these data from the networks and then compared all these
Fig. 2. Comparisons between C and L values of Bachs compositions with random and
regular networks with the same number of nodes and edges
Fig. 3. L-C Space. Networks extracted from Bachs (squares) and Mozarts (circles)
composition. Each symbol in the L-C space is characterized by values of clustering
coecient and average path length.
Science of Networks and Music 647
values with the values of completely ordered and random networks with the
same number of nodes and edges. The results are very interesting, since the data
of our networks are found to be intermediate between order and randomness.
Fig. 2, shows some data extracted from Bachs compositions and some graphical
representations of these comparisons.
The second step was to compare values from dierent authors trying to nd
interesting dierences that can be underlined by means numerical and graphical
data. We introduced a two-dimensional space with path-length on X axis and
clustering coecient on Y axis (Fig. 3), and found a certain tendency of dierent
authors compositions to occupy dierent areas of this space. There are of course
some overlaps in these data. Although these data are only preliminary, but we
also nd them interesting.
Finally, we carried out an ANOVA analysis between groups of compositions
of dierent authors. We compared C and L values from about 100 musical com-
positions classied for musical genre (Classical or Pop) and author (J. S. Bach,
F. Battiato, The Beatles, G. F. Handel, W. A. Mozart and F. Zappa). The rst
analysis did not produce signicant results, there was no signicant dierence
between data from dierent genres, but the second one showed signicant dif-
ferences in the clustering coecient of dierent composers:
F (5, 95) = 4.461, p < .001
this result may be a good starting point for more analytical studies and for the
development of graph-based algorithms for author attribution.
5 Creation of Melodies
There are two basic algorithms for the production of melodies through SWAP.
The rst nds the minimum path between given start and an end nodes, gen-
erating a simple melody. This function, based on Floyd-Warshall algorithm, is
basically a way of exploring networks characteristics through sound. From a
musical view point, the scales derived from this technique are not very inter-
esting, but we found this algorithm a very interesting to better understand the
dynamics of the studied network.
The second is a simple random algorithm called Walking which, starting from
a given node, randomly chooses the path to cover by randomly selecting subse-
quent links. The user may dene the start node, the sequences length, the speed
(expressed in beat per minute) and the resolution of the melody (1/4, 1/8, 1/16
etc.). The melodies obtained by this algorithm are much more pleasant than the
previous ones. In this case, the melody does not follow a minimum path but is
free to walk in the structure. This method may show some similarities with
the Markovian Chains, but walking is a simpler algorithm. Markovian Chain is a
stockhastic model in which the transition probability which determines the pas-
sage to another state of the system depends only on the previous state. Walking
is a model in which the selection of the paths is based on the distribution of links
in the structure. Thus, if we have as st step a node connected to three other
648 G. Campolongo and S. Vena
nodes, the three respective links have the same possibility of being choosen. The
only constraint is the connection between vertices, and the paths are possible
only where a link is placed (there is no possibility of going, for instance, from a
generic node A to another node B if there is no link between them).
Of course, the resulting melody is inuenced by the distribution of the edges
in the network, so that if we have interesting intervals in the network the scales
will be more music-like. For instance, also the imported networks can be re-
used in order to create melodies. This is an interesting technique because allows
to use a structure whose melodic intervals (i.e., the connections between nodes)
are of course non-dissonant, so the resulting scale should be melodic.
The interesting thing is that the melody produced by means of this technique
are completely dierent by the original one. As we discussed above, network
is a kind of representation which can be thought of as a map, so a virtually
innite number of musics can be created by the same network, just like there are
innite ways of walking in the streets of a city. Since our algorithm is completely
random the musics generated are self-organized, so we do not even inuence
musical creation.This result suggests that a good network is able to produce
nice melodies just by means of a random algorithm. In the next section we
will introduce the use of genetic algorithms for the generation of networks with
determined characteristics in order to produce interesting melodies.
6 Genetic Algorithms
where t is the threshold parameter. Fig. 4 shows the basic features of our t-
ness function. After dening an interesting region, for example by observing the
distribution of compositions of a certain author in the L-C plane, it is possible
to grow populations of networks situated near the center of the circle. In order
to create a population of networks we dene the number of individuals of the
population and the number of nodes and links of networks. As we said we cre-
ated a tness function for minimizing the distance from a target point called T
Science of Networks and Music 649
in the L-C plane, of course the target is characterized by values of average path
length (X axis) and clustering coecient (Y axis). Thus we dene the following
parameters: Target Point (i.e., the C and L values of the networks), Random
Rate (the percentage of new fellows), Injection Rate (the number of epochs for
each random injection of individuals), Mutation Rate (the probability to have
a mutation of a gene), Elite (the percentage of individuals that will be taken
for the next population), Crossover Rate (the number of individuals that will be
genrated by crossover), Radius (the radius of the target area) and the Number of
Epochs. Fig. 5 show results of some experiments on a population of 300 fellows
for 1000 epochs. We dened a random rate of 5% , an elite of 2%, a crossover
rate of 70% and a mutation rate of 0.1%. The initial population has an average
tness just above 0.3 and the best individuals are very close to the average, after
1000 epochs the best networks have more than 0.8 of tness and the mean is
over 0.75. As shown in gure, the introduction of new individuals generates a
decrease of the mean, in fact the means values are continuosly oscillating. In
this case we xed the injection rate at 10, so every 10 epochs a 5% of new fellows
are inserted in the population.
This work shows a new approach on musical representation and analysis based
on small-world networks. Musical pieces have been represented as graphs and
studied by using the tools of science of networks. The results obtained with these
experiments are interesting since they underline similarities between the inner
structure of some compositions and the graphs studied by Watts and Strogatz
and by Barabasi. Of course, this is the rst step of a more complex study ori-
650 G. Campolongo and S. Vena
Fig. 5. Results of experimentations with genetic algorithms. The black line represents
the tness values of the best fellows and the grey line represent the mean.
References
1 Introduction
At the junction between computer music and articial intelligence lies the goal
of developing generative or interactive software agents which exhibit musical-
ity. The appearance of musicality is determined either by a listening, watching
audience or an interacting performing musician. Attempts in this domain have
taken on a wide variety of forms due, on the one hand, to the wide variety of
methods in articial intelligence, and broadened further by the myriad possible
interpretations of these techniques in a musical domain, and in addition by the
myriad musical styles and substrates in which such an interpretation takes place
[1, 2].
The approach presented here is inspired rst and foremost by a search for
general-purpose behavioural entities that could be adopted by computer mu-
sicians in a exible manner. Our approach is inuenced by, and indeed made
possible by, the availability and popularity of modular extensible computer mu-
sic platforms such as Max/MSP [3], PD [4] and SuperCollider [5]. Practising
musicians who work with these tools often build up personalised repertoires
of software patches and commonly adapt publicly available third-party objects
to their own performance needs. This feeds a powerful form of social creative
search in which something designed with a given purpose in mind may be re-
appropriated indenitely. Thus rather than thinking in terms of stand-alone
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 652663, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Continuous-Time Recurrent Neural Networks 653
1.1 Background
Non-symbolic articial intelligence (AI) emphasises low-level behaviours at the
heart of all species strategies for survival, as understood in terms of Darwins
theory of evolution [6]. Early work in cybernetics by Grey-Walter [7] established
an experimental context in which wheeled robots, containing sensors and motors
connected by simple analogue circuits, could be designed to produce observably
lifelike behaviour. Since the development of Genetic Algorithms (GAs) and in-
creasingly smaller and faster computer processors, it has become possible to
evolve compact algorithms that allow a physical wheeled robot to satisfactorily
perform more precisely dened cognitive tasks.
The notion of minimal cognition [8, 9] has helped home in on the meaning of
the term lifelike. In recent years a great eort has been made to understand how
extremely simple biologically-inspired algorithms could learn tasks such as object
recognition, selective attention and simple memory, using CTRNNs embodied in
simulated agents and situated in simple physical environments.
where i is the time constant, gi is the gain and bi is the bias for neuron i, Ii is
any external input for neuron i and Wij is the weight of the connection between
neuron i and neuron j. is a non-linear transfer function which in our case is
tanh.
CTRNNs allow recurrency, meaning that network connections can exist in
any direction, including potential connections from any node to itself. The com-
bination of recurrency and internal state makes for a system which can produce
complex internal patterns of activity and which has a memory-like response to
its environment [8]. Each node has three parameters bias, gain and time con-
stant associated with its behaviour, and each connection between nodes has
654 O. Bown and S. Lexer
Output Nodes
Input Nodes Hidden Nodes (subset of hidden nodes)
one parameter a weight. Due to the complex relation between network param-
eters and network behaviour, along with the non-specicity of solutions to tasks
that they are typically used for, a common method for arriving at a CTRNN
which performs a certain task is to use a Genetic Algorithm (GA).
CTRNNs are known to be theoretically capable of replicating any dynamical
system, and it has been shown that very small CTRNNs are capable of arbitrarily
complex dynamics [10].
Our interest in the CTRNN as a musical unit stems from the potentially un-
bounded range of temporal dynamics of which it is capable. It is a sub-symbolic
system, meaning that it is unlikely to have immediate application in the domain
of discretised musical events traditional to computer music and fundamentally
implicit in human musical behaviour. Although an appropriate use of CTRNNs
would therefore be in the signal domain, we focus on simple rhythmic behaviours
at the control rate, and we refer to this domain as gestural. The CTRNNs in-
teraction with the world consists of vectors of real valued inputs and outputs,
updating continually and frequently, in our case on the order of 10 milliseconds.
A representative example of the CTRNN in a musical context, which illustrates
our intended use of the system, places it with a series of features extracted from
an audio stream as input, and a set of synthesis parameters as output. In this
case the CTRNN is conceived of as a direct interactive participant in a musical
performance.
Beer [10] provides an extensive mathematical analysis of the behaviour of
small CTRNNs, the presentation of which is beyond the scope of this paper. A
Continuous-Time Recurrent Neural Networks 655
key notion in such an analysis is that nodes with suitably strong self-connections
can saturate, meaning that their own feedback dominates their input and locks
them in a certain state. In this way nodes can act as internal switches which
inuence the behaviour of the rest of the network. Such nodes may ip states.
More broadly, due to its internal state and recurrency, the state of a CTRNN at
time t is determined not only by the present input, but the history of inputs up
until time t, and the starting state of the network.
In the case of a static input (i.e., whose values are not changing), CTRNNs
can be described in terms of their internal dynamics in the same way as any
closed dynamical system. The network will either nd a static resting state,
move periodically, or move quasiperiodically or chaotically [11].
In the case of changing inputs, we consider three distinct categories of CTRNN
behaviour as perceived. These categories are specic to our musical purpose and
are not derived from a formal approach: they attempt to capture a musicians
perception of CTRNN behaviour rather than CTRNNs themselves.
In the rst category, each input state leads to one output pattern, which may
be either static or cyclical. As the input state moves from A to B and back again,
the CTRNN moves between associated patterns, returning to the same pattern
for each input.
The second category is identical to the rst except that after changing input
state from A to B, a transitory period of indeterminate length precedes the
outputs arrival at its resting pattern. The length and form of these transitionary
sections vary depending on the particular trajectory through input states, but
always end up at the same pattern for any given input state.
In the third category, there may be more than one output pattern for each
input state, and which one is arrived at will be determined by the trajectory
through input states. Thus moving from input state A to input state B leads
to a dierent output pattern than if one were to move via input state C. In
other words, the system has multiple attractors for the same resting input, and
is dependent on the history of the input leading up to its resting state.
2 Musical Uses
The use of neural networks in studies of music cognition and in composition al-
ready has a long and rich history. Commonly cited benets of a neural network
approach to musical problems are generalisation, the possibility of extrapolat-
ing general features from a corpus of examples, and graceful degradation, the
robust nature of the networks response to unexpected inputs, as compared to
rule-based systems. [12] and [13] contain an exemplary cross-section of such
work. Mozer [14], for example, uses trained recurrent neural networks to gener-
ate musical melodies with note-by-note prediction. He identies and addresses
problems of securing musical structure over longer time scales than are naturally
dealt with by the network, and thus improves the quality of melodies generated
in this way. In this and other work in music and AI the nal goal is often a
machine that makes novel competent music within a given context without the
656 O. Bown and S. Lexer
is its decay period, which lies beyond the control of the player: the sound is on
its own after the hammer strikes the string. The pianist John Tilbury describes
this as the contingency of the piano sound [16]. The present form of the CTRNN,
even randomly generated, oers an opportunity to implement similar contingent
relationships between performer and electroacoustic process, generating micro
structures that control electroacoustic processes in turn dependent on the per-
formers activity. Given the CTRNN as a Max/MSP object the user would be
free to develop his or her own approach to the problem of how to map inputs and
outputs, and take on the responsibility of making the CTRNN sound good in
his or her own musical context through an iterated approach of performance and
adjustment. The challenge of rening parameter mappings is already familiar to
any musician developing interactive software.
Introducing evolution to the project suggests the possibility of developing a
system that has certain behavioural facets that would drive an engaging im-
provisation in very basic ways: for example responding to an input pattern but
not in the same way every time; behaviour potentially changing over time; set-
tling into patterns but producing variations on themes. Put more loosely, such
a system should have idiosyncrasies and tendencies that the user could put to
eective use to drive an improvisation. However, writing tness functions that
result in these desires being met is not straightforward. In the following section
we describe initial attempts to do this.
Informal interactive testing of the evolved and random networks showed a re-
markable range of behaviours, and the most immediate implication of this is that
a more thorough categorisation of CTRNN behaviour from a musical point of
view is in order. Evolved networks tended to be doing something more interesting
and often the nature of their evolved behaviour was immediately apparent: by
repeating the same input patterns one could easily observe the network falling
660 O. Bown and S. Lexer
4 Future Work
When using the CTRNN in Max/MSP, the user is able to observe three dimen-
sions of output states from a CTRNN in a window generated in Jitter, the video
editing extension to Max/MSP. By observing output states during practice we
propose developing this interface so that a musician could draw colour-coded re-
gions into the 3-D space that he or she wishes to correlate to specic parameter
settings of an instrument. A feedforward neural network could then be trained
to implement the desired mapping. Through an iterative process of practice and
adjustment a more carefully crafted combination of behaviour and desired sound
could be developed, bringing together a CTRNN behaviour with a specic reper-
toire of output states. Successfully implementing this addition may reinforce the
662 O. Bown and S. Lexer
notion that it is sucient to provide the musician with a set of CTRNNs with
general-purpose behaviours. The musician is then able to chose from a set of
behaviours, and iteratively design a mapping from behaviour to audible musical
output. Similar processes could be applicable to the input of the network.
5 Summary
We have introduced the CTRNN as a performative and/or compositional tool
for musicians using modular extensible computer music platforms such as
Max/MSP. We have described how networks can be randomly generated or
evolved to produce particular behavioural properties, and demonstrate very sim-
ple examples in which evolved CTRNNs exhibit behaviours that are of interest
to improvising musicians.
We have discussed future work in this area, including gathering training data
to be used for the evolution of more specic CTRNN behaviours and developing
mappings from CTRNNs to performance parameters using a trained feedforward
network. We suggested that the CTRNN should be adapted by musicians ac-
cording to their own performance contexts and their own interpretation of its
behaviour, and that it should inform their own actions during performance as
well as during the development of their performance contexts.
The notion of a coevolution or adaptive codevelopment between CTRNN be-
haviour and user is provoked by the present work. We suggest that this problem
could be made into a fruitful topic of research.
Acknowledgments
Oliver Bowns research is supported by a bursary from the Department of Com-
puting, Goldsmiths College. We would like to thank Geraint Wiggins for exten-
sive feedback and Mike Riley and the Goldsmiths Department of Visual Arts
Digital Media Lab for the use of their computers as Apple XGrid agents.
Continuous-Time Recurrent Neural Networks 663
References
1. P. M. Todd and G. Werner. Frankensteinian approaches to evolutionary music
composition. In Niall Grith and Peter M. Todd, editors, Musical Networks: Par-
allel Distributed Perception and Performance, pages 313339. MIT Press/Bradford
Books, Cambridge, MA, 1999.
2. E. Miranda. Composing Music with Computers. Focal Press, 2001.
3. http://www.cycling74.com.
4. http://www.puredata.info.
5. http://www.audiosynth.com.
6. C. Darwin. On the origin of species by means of natural selection, or The preser-
vation of favoured races in the struggle for life. D. Appleton and company, 1860.
7. W. Grey Walter. An imitation of life. Scientific American, 182(4):4254, 1950.
8. A.C. Slocum, D.C. Downey, and R.D. Beer. Further experiments in the evolution
of minimally cognitive behavior: From perceiving aordances to selective atten-
tion. In J. Meyer, A. Berthoz, D. Floreano, H. Roitblat, and S. Wilson, editors,
From Animals to Animats 6: Proceedings of the Sixth International Conference on
Simulation of Adaptive Behavior, pages 430439. MIT Press, 2000.
9. R.D. Beer. The dynamics of active categorical perception in an evolved model
agent. Adaptive Behavior, 11(4):209243, 2003.
10. R. D. Beer. On the dynamics of small continuous recurrent neural networks. Adap-
tive Behavior, 3(4):469509, 1995.
11. D. Kaplan and L. Glass. Understanding Nonlinear Dynamics. Springer-Verlag,
1995.
12. P. M. Todd and D. Gareth Loy. Music and Connectionism. MIT Press, 1991.
13. N. Grith and P. M. Todd, editors. Musical Networks. MIT Press, 1999.
14. M. C. Mozer. Neural network music composition by prediction: Exploring the ben-
ets of psychoacoustic constraints and multi-scale processing. Connection Science,
6(2-3):247280, 1994.
15. K. Sims. Evolving 3d morphology and behaviour by competition. In Artificial Life
IV Proceedings. MIT Press, 1994.
16. J. Tilbury. Feldman and the piano: the art of touch and celebration of contin-
gency. In Second Biennial International Conference On Twentieth-Century Music,
Goldsmiths College, University of London, 2001.
Synthesising Timbres and Timbre-Changes
from Adjectives/Adverbs
Computing Laboratory,
University of Kent,
Canterbury, Kent, CT2 7NF,
England
ag84@kent.ac.uk, cgj@kent.ac.uk
1 Introduction
The term timbre is used in various ways in music. One way is in describing
gross categories of sounds: instrument types, the sound of certain combinations
of instruments, dierent stops on a pipe organ, a discrete choice of sounds on a
simple electronic keyboard, and so on.
A second aspect of timbre is the distinctive sound qualities and changes in
those qualities that can be produced within one of those gross categories. To a
skilled player of an acoustic instrument, such timbral adjustments are part of
day-to-day skill. A notated piece of music might contain instructions concerning
such timbres, either in absolute terms (harshly, sweetly) or comparative terms
(becoming reedier), and musicians use such terms between each other to com-
municate about sound (Can you sound a little more upbeat/exciting/relaxed).
From here onwards we will denote these two concepts respectively by the
terms gross timbre and adjectival timbre.
The player of a typical electronic (synthesis-based) instrument does not have
access to many of these timbral subtleties. Occasionally this is because the syn-
thesis algorithms are incapable of producing the kinds of changes required. How-
ever in many cases this lack of capability is not to do with the capacity of the
synthesis algorithmafterall, a typical synthesis algorithm is capable of produc-
ing a much larger range of sound changes than a physically-constrained acoustic
instrumentbut to do with the interface between the musician and the in-
strument/program [1, 2]. In current systems, the know-how required in order to
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 664675, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Synthesising Timbres and Timbre-Changes from Adjectives/Adverbs 665
2 Approaches to Timbre
Compared to other aspects of music such as pitch and rhythm, timbre is not well
understood. This is evidenced in a number of ways. For characteristics such as
pitch and rhythm, there exist theories of how they work and produce perceptual
eects; there are well-understood notations for them; and we understand how to
synthesize them from fundamental components to get a particular eect.
By contrast, timbre lacks this repertoire of theory and notational support (as
explored by Wishart [3]). Nonetheless there is a large repertoire of language asso-
ciated with timbre and timbral changes. These timbral adjectives and metaphors
provide a powerful means for musicians to communicate between themselves
about timbre; but by contrast to the more formal notations for, say, pitch or
rhythm, they do not provide a usable structure for inputting desired timbres or
timbral changes into computer music systems [4, 1, 5, 6].
One approach would be to come up with a new language for timbre, which
is more closely aligned with the way in which timbre is generated in electronic
instruments. However this has many problems. For example timbre words convey
information that has musical meaning, and we would like to create systems
so that electronic and acoustic instruments can be played side-by-side and the
players able to communicate using a common vocabulary. For these reasons we
focus on how we can use traditional timbre words in a computer music setting.
These two aspects of timbre are very dierent; most of the literature on timbre
in computer music has focused on the gross categorisation, beginning with the
early of Wessel [7].
An example of studies of gross timbre is the work of McAdams et al. [8].
In this work three dimensional timbre space was dened, the dimensions being
attack time (time taken for volume of a note to reach maximum), the spectral
centroid (the relative presence of high frequency versus low-frequency energy in
the frequency spectrum), and spectral ux (a measure of how much the spec-
tral changes over the duration of a tone). A number of instruments where then
analysed by these three techniques and a graph of the results showed how each
occupied a dierent part of the timbre space.
Such representations are useful when the aim is purely analytic, i.e. we want
to understand existing sounds. However the aim of our work is oriented towards
synthesis, and so we need to consider what representation is appropriate for being
used backwards to go from analysis to synthesis. Whilst a representation such as
the three-dimensional model in [8] might yield acceptable results for categorising
sounds, this representation is not adequate for synthesis of sound. We certainly
could not work backwards from a three dimensional timbre representation and
hope to synthesise the starting sound, since the representation is oversimplied
and too much information has been lost.
Much of the recent work in the area of gross timbre has focused on the devel-
oping MPEG-7 standard. This work denes a framework for describing sounds in
terms of spectral and temporal measurements of the sound, extracted through
some analysis method. This work is interesting in that it identies a number
of features that are proven to be important for recognition and perception of
timbre based on past research.
A large proportion of other research into timbre in computing has focused
on automated recognition or categorisation of instruments. For example Kostek
[9] describes a system that uses machine learning methods to classify which
instrument is playing.
This has possible applications in databases of music for automated searching
of stored sounds. The automated approach eliminates the need for a human to
enter metadata identifying each sound, thus greatly simplifying the process of
creating large sound databases. The common approach is to use neural networks
to learn the sound of various instruments after being presented with various
recordings of each. The key in this sort of work is to nd common features
between dierent recordings of a certain type of instrument, where the recordings
may have dierent pitches, loudness, or playing style. Such features may be
specically symbolically represented, e.g. if the classication is performed using
a decision tree method; or, they may be subsymbolically represented e.g. if a
neural network was used.
Analysis of real instruments reveals that the tone of a single instrument can
vary greatly when pitch is changed, or with changes in the volume of the playing.
Therefore, the challenge in gross timbre identication is to identify the common
features of the sound that identify a certain instrument, despite the large varia-
tions in tone that can be produced.
Synthesising Timbres and Timbre-Changes from Adjectives/Adverbs 667
A dierent body of work focuses on the concept of adjectival timbre. Here, the
focus is not on studying the sound of an instrument as a whole, but on looking
at individual generic characteristics of sounds such as brightness, harshness,
or thickness. Early work on this was carried out by Grey [10], who identied
some features of a synthesis algorithm which correlate strongly with timbral
characteristics.
There are many studies in the eld of psychoacoustics where experiments
have been carried out to identify salient perceptual parameters of timbre. This
experiments have usually taken the form of listening tests where volunteers have
produced verbal descriptions of sounds, and the results are analysed to nd
correlations in the language used. This is useful as it identies adjectives that
are important for describing sounds, and this could form the basis for the types
of perceptual features we might aim to control in the synthesiser program we
are developing. However, these psychoacoustic experiments by themselves are
not enough in order to synthesise the given perceptual features, since we also
need to nd a correlation of an adjective with certain spectral and temporal
features of a sound; then more specically with the parameters within a specic
synthesis algorithm that give rise to those timbres or timbral changes.
The SeaWave project [4] made some progress in nding these correlations.
Certain spectral and temporal features of starting sounds where modied, and
the perceived changes in timbre where recorded. Some correlations where found
and these were used to develop a synthesis system where certain timbral fea-
tures such as resonance, clarity, or warmth could be controlled. The number
of adjectives that where available to user to control the sound where limited,
suggesting that more a much more extensive study of synthesis parameters and
their perceptual correlates is needed.
It is interesting to note that while machine learning techniques have been used
for automated classication of dierent instruments, it does not appear that a
general system has been developed for automatically identifying adjectives that
describe a certain sound. The small amount of work that has been carried out
in this area has focused on specic domains. For example a recent paper by
Disley and Howard [11] is concerned with the automated classication of timbral
characteristics of pipe organ stops. It does not appear that any work has been
carried out on automated classication of timbral dierences between pairs of
sounds.
The most limited range of work to date has been on the automated synthesis of
timbres or timbral changes.
Some work has been done on the automated synthesis of gross timbre. Clearly
it is not possible to synthesise gross timbre from just words, but machine learn-
ing methods can be applied to learn synthesis parameters for a particular instru-
mental sound. In these cases the learning is guided either by interaction with a
668 A. Gounaropoulos and C. Johnson
human [12, 13] or by the comparison of spectral features between the synthesized
instrument-candidates and recordings of real instruments [14].
Of greater interest the this project is the automated synthesis of adjectival
timbre. There are two basic concepts: associating adjectives/adverbs and classi-
cations with timbres (wooden, bright), and words which are describe charac-
teristics that sit on a timbral continuum (can you play less reedily please?, lets
have some more bounce). A preliminary attempt to create a dictionary of such
timbre-words, and to group them into classes, was attempted by Etherington
and Punch [4].
A small amount of work has attempted to do automated synthesis of sounds
from timbral descriptions. The SeaWave system [4] is based on a number of
listening experiments which attempt to match specic sets of transformations of
sound signals with words that describe those transformations. This works well
up to a point; however the transformations required to achieve many kinds of
transformations are likely to be complex, requiring more than simply the increase
or decrease of a couple of synthesis parameters; and also they will typically be
dependent on the starting sound.
Another attempt to generate quasi-timbral aspects of sound is given by Mi-
randa [1]. He made use of machine learning methods to deduce correlations
between parameters that could be used in synthesis and their perceived eects.
This was then used to build up a database of matches between descriptive terms
and characteristics which are used in synthesis; when the user requests a sound
these characteristics are looked up in the database and a synthesis algorithm
called with these characteristics. This provides a powerful methodology for gen-
erating sounds ex nihilo; however it was not applied to transforming existing
sounds.
Since these two groundbreaking pieces of work, there appears to be no further
work on linking linguistic descriptions of adjectival timbre to synthesis.
One of the main diculties with synthesis of timbres from descriptions is the
complex nature of the mapping from the parameter space of a synthesis algo-
rithm to the space of sounds, and then to the space of features that are described
when we hear sounds (a more general exploration of such complexities in AI is
given by Sloman [15]). Typically, many dierent closed subsets in parameter
space will map onto the same timbre adjectives. Furthermore, features of timbre
are inuenced by previous exposure. For example, we are familiar with wooden
and metallic sounds, and believe these to be contrasted; however in a syn-
thesis algorithm it is possible to realise sounds that are physically unrealistic,
e.g. sounds which are between wooden and metallic, or which have both such
characteristics.
Synthesising Timbres and Timbre-Changes from Adjectives/Adverbs 669
This complexity contrasts with, say, loudness, where the mapping from the
parameter (amplitude) to the perceptual eect (loudness) is straightforward.
This presents a challenging problem for interfaces for timbre [2]; the timbral
equivalent of the volume knob or piano keyboard is not obvious, nor is it obvious
that such an interface could exist.
4.1 Approaches
There are basically two approaches to this problem. One is an analytic approach,
where we work out directly how changes in the production of a sound lead to
changes in the perceived timbral properties of the sound. The second, which
we have used in our work, is a machine learning approach, where we take many
examples of sounds, associate human-written metadata about timbre with them,
and then apply machine learning methods [16] to create the relevant mappings.
An initial, somewhat nave, perspective on this is to view it as being an inverse
problem. That is, we take a set of sounds (recorded from acoustic instruments)
that demonstrate the desired timbres (or timbral changes), and analyse what
characteristics are common between the sounds that t into an similar timbre-
class. Then we apply analysis methods (e.g. spectral analysis) to these sounds
to understand what characteristics of the sound cause the dierent perceived
timbral eects, and then apply these same characteristics to our desired sound
to transform it in the same way.
However there are (at least!) two problems with this nave model. Firstly, it
is usually dicult to extract the characteristics that characterise a particular
timbre or change of timbre. We can analyse sounds in many dierent ways, and
not all of the characteristics that we can extract will be relevant to the production
of a particular timbre. Even if we can eliminate some features by removing those
characteristics that are common to sounds in various classes, there may be some
arbitrary features that are irrelevant.
A second diculty is found in the nal stage of the process. Even if we can
isolate such a characteristic, it is not easy to apply this to another sound: some-
times the characteristic can be dicult to express within a particular synthesis
paradigm, and even if we can apply it, changing that characteristic in synthesis
parameter space will not have the same eect in perceptual space. An additional
problem of this kind is that, even when the changed sound is available within
the synthesis algorithm being used, nding an appropriate change of parameters
to eect the timbral change can be dicult.
670 A. Gounaropoulos and C. Johnson
Firstly, some pre-processing is carried out on the input sound wave in order to
greatly reduce the complexity of the data, and therefore make it suitable for use
as input to a neural network. Spectral analysis is carried out on the audio using
an FFT, and the partials making up the sound are extracted from this, including
the amplitude envelope and tuning information for each partial. From this data,
a set of 20 inputs for the neural network is generated. Inputs 1-15 are the peak
amplitude of each of the rst 15 partials of the sound, which should describe
the general colour of the timbre. The next input is the average detuning of
the partials, which describes how much the tuning of the partials diers from
a precise harmonic series. The remaining inputs describe the overall amplitude
envelope of the sound, and are attack time (time taken for the amplitude to
reach its peak), decay time (time from the peak amplitude to the end of the
note), and nally attack and decay slopes (rate of change in amplitude) which
describe the general shape of the amplitude envelope.
The aim of the neural network is to map a set of inputs onto a set of val-
ues describing the timbral features present in the sound. In order to dene the
expected output of the network in our prototype, samples of notes from 30 (syn-
thesised) instruments were collected, and 9 adjectives where chosen to describe
the timbre of each instrument (bright, warm, harsh, thick, metallic, woody, hit,
plucked, constant amplitude). Listening tests where carried out on each instru-
ment sample, and values ranging from 0 to 1 were assigned indicating how well
each adjective described the instrument (a value of 1 meaning the particular
feature was strongly present in the sound, while 0 indicating that the adjective
did not apply to the sound at all). This work decided that our neural network
would have 9 outputs onto which the inputs are mapped.
An application (gure 2) was developed that takes a list of sound les and their
associated values for each adjective, carries out the necessary pre- processing to
generate a set of inputs, and then trains a neural network to associate the correct
adjectives with each sounds input data. A 3-layer back-propagation network was
used, with 100 neurons per layer (this value was chosen empirically and gave
reasonable training times as well as better generalisation than was achieved
with smaller networks). Once the network is trained, the application allows the
user to select an instrument sound that was not included in the training data,
and run it through the system to classify its timbre.
5 Results
5.1 Results of the Timbre-Classification Algorithm
The results of the timbre recognition process are presented in table 1. This
shows a comparison between the timbral characteristics of ve sounds, classied
by a list of adjectives and a value indicated how strongly each characteristic was
detected in the sound, by both a human listener and the neural network.
Early results from our timbre classication system are encouraging. In the
experiment, a single human listener rst assigned values to describe the timbre
of the ve test sounds, then the neural network was used to obtain a classica-
tion. The test sounds of course had not been used as part of the training set for
the network. The results table shows that the prototype at this stage generally
works well. There is evidence that the system is extracting common timbral fea-
tures across dierent types of instrument sounds. It is particularly successful in
detecting harshness, or sounds that are woody, or sounds that are hit. Unsur-
prisingly, it has trouble distinguishing between hit or plucked instruments, which
is to be expected since the networks input data contains no information about
674 A. Gounaropoulos and C. Johnson
Table 1. The table rst shows the expected value from a user listening test, followed
by the neural networks actual answer in bold. A value of 1.0 indicates that a feature
is strongly present in the sound, whereas a value of 0.0 indicates that the feature is
absent.
Instrument Bright Warm Harsh Thick Metallic Woody Hit Plucked Constant
Amplitude
Vibraphone 0.6 0.5 0.4 0.4 0.5 0.3 1.0 0.0 0.0
0.6 0.8 0.4 0.3 0.1 0.2 1.0 0.0 0.0
Elec. Guitar 0.7 0.2 0.7 0.2 0.4 0.1 0.0 1.0 0.0
0.3 0.6 0.7 0.4 0.2 0.3 0.6 0.4 0.0
Piano 0.6 0.5 0.1 0.6 0.2 0.3 1.0 0.0 0.0
0.7 0.4 0.3 0.6 0.0 0.2 1.0 0.0 0.0
Xylophone 0.8 0.3 0.7 0.1 0.0 0.8 1.0 0.0 0.0
0.7 0.5 0.6 0.4 0.0 0.9 1.0 0.0 0.0
Elec. Piano 0.5 0.5 0.2 0.4 0.2 0.2 1.0 0.0 0.0
0.2 0.9 0.1 0.1 0.1 0.2 1.0 0.0 0.0
6 Future Directions
There are many future directions for this work. Some of these are concerned
with timbre recognition, for example using ear-like preprocessing (as in [20]) of
sound to generate the inputs to the neural network. We will also carry out more
extensive human trials of the timbre-recognition experiments.
There are many future directions beyond this. A major limitation of many
attempts at automated synthesis of timbre or timbre change is that the learning
has been applied to a single sound. Future work will focus on learning transfor-
mations of synthesizer parameter space with the aim of nding transformations
that will apply to many dierent sounds.
References
1. Eduardo Reck Miranda. An articial intelligence approach to sound design. Com-
puter Music Journal, 19(2):5975, 1995.
2. Allan Seago, Simon Holland, and Paul Mulholland. A critical analysis of synthesizer
user interfaces for timbre. In Proceedings of the XVIIIrd British HCI Group Annual
Conference. Springer, 2004.
Synthesising Timbres and Timbre-Changes from Adjectives/Adverbs 675
1 Background
1.1 The Expressive Performance Modelling Problem
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 676687, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Modelling Expressive Performance 677
error Err(M (Si ), Ei) between the prediction M (Si ) and the actual expressive
transformation Ei for all the pairs (Si , Ei ) in B. Typically Si is a set of features
describing the melodic context of note Ni , such as its melodic and rhythmic
intervals with its neighbours or its metrical position in the bar. On the other
hand, Ei contain local timing and overall energy information of the performed
note, but can also contain ner-grain information about some intra-note features
(e.g. energy envelope shape, as studied in [2], or pitch envelope).
In the past, we have studied expressive deviations on note duration, note on-
set and note energy [3]. We have used this study as the basis of an inductive
content-based transformation system for performing expressive transformation
on musical phrases. Among the generative models we induced, Regression Trees
and Model Trees (an extension of the former) showed the best accuracy.
Regression Trees, are widely used in pattern recognition tasks, each non-leaf
node in a Regression Tree performs a test on a particular input value (e.g. in-
equality with a constant), while each non-leaf node is a number representing
the predicted value. By using a succession of IF-THEN-ELSE rules, Regression
Trees iteratively split the set of training examples into subsets where the pre-
diction can be achieved with increasing accuracy. The resulting structure is a
coherent set of mutually-excluding rules which represents the training set in a
hierarchical way. Figure 1 shows an example of a simple Regression Tree. This
tree performs tests on 2 dierent features and returns a numerical prediction at
its leaves. Whether this tree can produce accurate predictions when processing
unseen data depends directly on the generalisation power of the model. To en-
sure good generalisation capability, tree induction algorithms, such as C4.5 [4]
and M5 [5], provide pruning operations that rely on statistical analysis on the
set of training examples.
Fig. 1. A simple Regression Tree example. Tests are performed on 2 dierent features,
namely Feature0 and Feature1. If a test succeeds, the left-side children is executed,
while if the test fails, the right-side children is executed. When reaching a leaf, a
numerical prediction is returned.
678 A. Hazan et al.
of the program, that is, inputs are compared to inputs which appear in the
training set. On the other hand, leaves contain predictions which should reect
the distribution of outputs Ei in the training set. Although both inputs and
predictions of our model can be coded as oat values, building a program that
performs direct comparisons between an input value, and an output value (i.e. a
melodic feature in Si and an expressive feature in Ei ), would produce ineective
computations, and a considerable eort would be spent in order to generate
individuals that reasonably t the training examples. Montana presented an
enhanced version of GP called Strongly Typed Genetic Programming (STGP)
which enforces data type constraints. Using this approach, it is possible to
specify what type of data a given GP primitive can accept as argument (e.g.
type of input), or can return (e.g. output). To the best of our knowledge, this
work presents a novel approach for building Regression Trees using STGP.
Going back to Figure 1 we can see that the general structure of the Regression
Tree is the following. Each node is a test comparing an input with a value that
was generated analysing the training data. Each leaf is an output value con-
taining the numerical prediction. Thus, both inputs and outputs of the program
should have dierent types. We dene 4 dierent types, namely InputValue, Feat-
Value, RegValue and Bool. The rst three types represent oating-point values,
while the latter type represents boolean values that will be used when perform-
ing tests. Now we can dene the primitives that will be used to build our models.
They are listed in Table. 1.
Table 1. STGP primitives used in this study. Each primitive is presented along with the
number of arguments it handles, the type of each argument, and its return type. Note
that primitives EFeatValue and ERegValue, which are used for constant generation,
take no argument as input.
The IF primitive tests whether its rst argument of type Bool is true or false.
If the test succeeds, its second argument is returned, otherwise its third argu-
ment is returned (both second and third arguments have a RegValue type). The
LT primitive tests whether its rst argument is lower than the second argument
(The former has a InputValue type and the latter FeatValue). The primitive
returns a Bool which value is true if the test succeeds, false otherwise. Given
the denitions of our types, we can see that during the building process the
output of the LT primitive will always be connected to the rst argument of
the IF primitive. Instead, we could have chosen to use a single primitive IFLT
with 4 arguments (the rst two being involved in InputValue typed comparison,
and the last two being used to return a RegValue). However, we think that re-
stricting the tests to inequalities would be a constraint for future developments.
EFeatValue and ERegValue are zero-argument primitives. We use them for gen-
erating constants. The rst one generates constants to be involved in inequality
tests (primitive LT). In order to produce meaningful tests, values produced by
EInput are likely to be the values appearing in the vectors Si of the training
set. To ensure this, we collected all the features in Si in B. When creating a
EFeatValue primitive, we choose randomly one of them. Similarly, ERegValue
primitive generates constants of type RegValue that will form the numerical pre-
dictions returned by the model. It is desirable that these predictions respect
Modelling Expressive Performance 681
Fig. 3. STGP Regression Tree example involving two successive tests. Typed connec-
tions between primitives are appearing. Also, constants generated by zero-arguments
primitives appear in parentheses. EFV stands for EFeatValue, while ERV stands for
ERegValue. The arrow indicates the output point of the program.
682 A. Hazan et al.
The genetic operators we use are basically a renement of the operators proposed
by [7], and [9] in their Strongly Typed variant, where the operation is constrained
to produce a type-consistent structure. Operators are Tree Crossover, where
two individuals can swap subtrees, and Standard Mutation, where a subtree is
replaced by a newly generated one. Additionally, Shrink mutation replaces a
branch with one of its child nodes, and Swap mutation swaps two subtrees of an
individual. Also, two oating-point mutation operators where added to process
the randomly-generated constant returned by EInput and ERegValue primitives.
We dene the following evolutionary settings for the dierent runs. The popula-
tion size is xed to 200 individuals. The evolution stops if 500 generations have
been processed or if the tness reaches a value of 0.95. We use a generational
replacement algorithm with a tournament selection. Crossover probability is set
to 0.9, with a branch-crossover probability of 0.2, which means that crossovers
are more likely to be performed on leaves, with the eect of redistributing the
constants and terminals in the tree. The Standard mutation probability is set
to 0.01, while the Swap mutation has been set to a higher value (0.1) in order
to let a individual reorganise its feature space partition with more probability.
Finally, Shrink mutation probability is set 0.05. The maximum tree depth has
been set to 10, which could lead to the generation of very complex individuals
(here we do not look for a compact model).
Modelling Expressive Performance 683
We build our system using Open Beagle framework [11], which is a C++
object oriented framework for Evolutionary Computation. Open Beagle was pri-
marily designed for solving Genetic Programming tasks and includes STGP fea-
tures such as the operators presented above.
In this section, we test the ability of our model in predicting expressive trans-
formations given dierent training situations. Note that in this section we will
not focus on the tness of the individuals as dened in last section. Rather, we
evaluate a model based on its error prediction. First, we address the problem of
evaluating precision and generalisation capability of the induced model in the
context of expressive music performance. In Figure. 4, we present the RMSE of
the best-so-far model when predicting the note duration ratio of the four ex-
cerpts presented above. On the top, the model is only trained with one of the
four fragments, namely Body and Soul played at 60 beats per minute, i.e, the
tness function is only based on this error. On the bottom, the model is trained
using the four fragments. First we can see that excerpts that share the same
Fig. 4. Best-so-far individual RMSE for each of the target songs when the tness takes
into account only the Body And Soul (played at 60 beats per minute) prediction error
(top), or when the tness takes into account prediction the error of the four fragments
(bottom). Gray solid (respectively dashed) line is the RMSE of Body And Soul played
at 60 (respectively 65) beats per minute. Black solid (respectively dashed) line is the
RMSE of Like Someone In Love played at 120 (respectively 140) beats per minute.
684 A. Hazan et al.
Fig. 5. Note by note Squared Error of the duration ratio during the evolution process
Fig. 6. Note by note duration ratio prediction for Like Someone in Love played at 140
beats per minute. Black solid line refers to the performance training data, grey dotted
line refers to the best-so-far STGP model, and grey dashed line corresponds to a greedy
Regression Tree model as presented in [3].
Tree models behave qualitatively well, and their mean prediction error is very
similar. Thus, our approach to build Regression Tree performance models based
on STGP is promising, although the results we present here are very preliminary.
4 Conclusion
We presented in this work a novel Strongly Typed Genetic Programming based
approach for building Regression Trees in order to model expressive music per-
formance. The Strongly Typed Genetic Programming framework has been in-
troduced, along with the primitives, operators and settings that we apply to this
particular task. Preliminary results show this technique to be competitive with
greedy Regression Tree techniques for a one-dimensional regression problem: the
prediction of the performed note duration ratio. We want to scale up our ap-
proach to the generation of expressive musical performances in a broader sense,
and plan to work in the following directions:
Predict more expressive features, such as onset and mean note energy devia-
tion, which will enable the model to predict expressive melodies. This will be
achieved by dening a new RegValue complex data type, along with an ap-
propriate constant generator and operators. New tness measures have to be
dened and in order to assert the musical similarity between the models out-
put and the performers transformations. We believe that the use of percep-
tually motivated tness measures (e.g. [12]) instead of statistical errors (e.g.
RMSE) can lead to substantial improvements of the accuracy of our models
and make the dierence with classical greedy techniques. Additionally, intra-
note features are of particular interest in monophonic performances and will
be considered. This will include energy, pitch, and timbrical features.
686 A. Hazan et al.
References
1. Ramirez, R., Hazan, A.: Understanding expressive music performance using ge-
netic algorithms. In: Third European Workshop on Evolutionary Music and Art,
Lauzane, Switzerland (2005)
2. Ramirez, R., Hazan, A., Maestre, E.: Intra-note features prediction model for jazz
saxophone performance. In: International Computer Music Conference, Barcelona,
Spain (2005)
3. Ramirez, R., Hazan, A.: A tool for generating and explaining expressive music
performances of monophonic jazz melodies. In: International Journal on Articial
Intelligence Tools (to appear). (2006)
4. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publish-
ers Inc. (1993)
5. Quinlan, J.R.: Learning with Continuous Classes. In: 5th Australian Joint Con-
ference on Articial Intelligence. (1992) 343348
6. Murthy, S.: Automatic construction of decision trees from data: A multi-
disciplinary survey. Data Mining and Knowledge Discovery 2 (1998) 345389
7. Koza, J.: Genetic Programming: On the programming of Computers by means of
natural Selection. MIT Press, Cambridge, MA, USA (1992)
8. Holland, J.: Adaptation in Natural and Articial Systems. University of Michigan
Press, Ann Arbour, MI (1975)
Modelling Expressive Performance 687
Cristyn Magnus
1 Background
Although electronic music experiments had been going on since the develop-
ment of the telephone, a great breakthrough came in 1948, when Pierre Schaef-
fer broadcast his early studies in musique concrete on Radio-diusion-Television
Francaise[1]. Musique Concrete is a genre in which composers manipulate record-
ings of actual sounds rather than notes. Composers who use notes deal with ab-
stract symbols that represent large categories of possible sounds; performances
are unique interpretations of the symbols. A composer of musique concrete pro-
duces a denitive recording that is the piece; at performances, the recording
is simply played. Techniques for composing with actual sounds give composers
access to an extremely wide array of timbresanything that could be recorded
or brought out of a recording through manipulation. We are no longer restricted
to pitches and rhythms that can be written using traditional western notational
symbols.
Since the incorporation of recorded sounds is pervasive in contemporary elec-
tronic music, it is ironic that little attention has been given to developing tech-
niques for manipulating recordings with genetic algorithms. Most research ap-
plying genetic algorithms to music has focused on symbolic music(see [2] for a
review). Some research has broached the issue of timbre exploration through
synthesis, but direct manipulation of recorded sounds has not been addressed.
Johnson[3, 4] and Dahlstedt[5, 6] use interactive genetic algorithms to explore
synthesis parameters. Horner, Beauchamp, and Packard[7] derive novel sounds
with an interactive genetic algorithm that applies ltering and time-warping op-
erations to populations of synthesized sounds. This comes closer to addressing
recorded sounds, since ltering and time-warping need not be applied exclusively
to synthesized sounds. These researchers all work to produce novel sounds that
can be worked into later compositions. For a series of recent compositions, I have
developed a technique that would allow me to use genetic algorithms to produce
a series of pieces constructed from found sounds whose form would be derived
from the evolutionary process.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 688695, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Evolutionary Musique Concrete 689
2.1 Representation
In a typical genetic algorithm, parameters are mapped onto genes and the or-
dered collection of genes forms a chromosome. Usually all chromosomes in the
population have the same number of genes. My technique operates directly on
digitized waveforms that can have arbitrary lengths. Each chromosome is a time-
domain waveform. Using instantaneous samples as genes would be a bad idea:
sexual reproduction would introduce clicks; mutation would introduce noise. So
in my algorithm, there is no analysis and there are no discrete genes. Instead, a
hybrid approach to genes is adopted. For the purpose of calculating tness, sam-
ples are treated as genes. For the purpose of sexual reproduction and mutation,
segments of waveform bounded by zero crossings are treated as genes.
Typically, a genetic algorithm runs for many generations. The initial popula-
tion and any intervening generations are discarded; a representative member of
the nal population is chosen as the product of the algorithm. My algorithm pro-
duces a piece of music whose formal structure is a product of the evolutionary
process. Each waveform produced by the algorithm becomes part of the nal
piece. A piece begins with the simultaneous playback of the initial waveform
population. Whenever a waveform nishes playing, a new waveform is generated
to take its place. The instant before a waveforms playback begins, its tness is
measured. Each waveforms playback volume is weighted by its tness.
Since the output of the algorithm is a piece of music, choices regarding output
representation are primarily aesthetic. If I wanted a piece with a dierent formal
structure or simply a tool to generate sonic material to use elsewhere, I would
make dierent choices.
2.2 Fitness
where n is the number of times the waveform has reproduced and b is a parameter
between 0 and 1. For b = 0, a waveform will never reproduce twice. For b =
1, a waveforms tness is not reduced by reproduction. The bn modier is to
encourage biodiversity (see 4.2).
Although a stripped down version of the algorithm can, under appropriate
circumstances, produce sounds that can be recognized as imitations of the target
waveform, this is not the compositional goal. The population is never expected
to converge to some target. The biodiversity modier lowers tness each time a
waveform becomes a parent to prevent the ospring of a handful of extremely
t individuals from dominating the population. In addition, the compositional
framework for the piece (3) has high-level control over the tness function,
which can change over the course of the piece.
2.3 Reproduction
Sexual reproduction is carried out by splicing genetic material from two indi-
viduals to produce one individual in the next generation. For each ospring,
two parents are selected from the population. The probability that an individual
will be selected as a parent is based on its tness. Each parent is divided at
some randomly selected crossover point. The location of the crossover point is
adjusted to make sure it falls on a zero crossing. The rst part of one parent
is spliced to the last part of the other parent (gure 1). Because the crossover
point is randomly selected and can be dierent for each parent, ospring can be
arbitrarily short or potentially as long as the combined lengths of both parents.
a)
b)
c)
Fig. 1. a) Two parent waveforms (solid line) with their randomly selected crossover
points (dotted line). b) The crossover point adjusted to fall on zero crossings. c) The
child waveform.
Evolutionary Musique Concrete 691
I could have used a xed crossover point, but I felt this was an opportunity to
introduce rhythmic interest.
2.4 Mutation
Mutation occurs immediately after the ospring is produced, before its playback
begins. Each segment of waveform between zero crossings has a slight probability
of mutating. This mutated segment of waveform can include multiple zero cross-
ings. Larger mutations are more perceptually relevant; that is, it is possible for a
listener to identify mutated segments and sometimes even the type of mutation.
Smaller mutations tend to denature the original sounds and produce waveforms
that sound more like the target waveform.
A typical mutation function adds a random number to a gene. We can ex-
tend this concept to waveforms by adjusting a waveforms amplitude (gure 2a).
This is done by selecting a random number and multiplying each sample of a
waveform segment by that number. Another way of extending this concept is
to raise each sample of a waveform segment by a power (gure 2b). To prevent
the exponentiation from severely amplifying or attenuating the segment being
mutated, each segment is normalized after exponentiation so that it retains its
original maximum amplitude.
a)
e)
b)
f)
c)
g)
d)
Fig. 2. Mutation operations showing the original waveform (short dashes) and the
resultant waveform (solid line) with the mutation boundaries (long dashes): a) amplify
b) exponentiate c) resample d) reverse e) remove f) repeat g) swap
692 C. Magnus
We can think in terms of time rather than amplitude and resample a segment
of waveform to lengthen it, making it lower in pitch, or to shorten it, raising its
pitch (gure 2c).
Because mutation is applied to segments of waveform, rather than individual
genes, we can draw inspiration from the types of errors that happen in actual
gene transcription. Mutation functions can reverse a waveform segment (gure
2d), remove a waveform segment entirely (gure 2e), repeat a waveform segment
a random number of times (gure 2f), or swap neighboring waveform segments
(gure 2g).
3 Compositional Framework
In a single, unchanging environment, the algorithm described above would even-
tually converge to a local minimum where all individuals would have roughly the
same length as the target waveform and would have acquired some of its am-
plitude envelope and frequency characteristics. To create formal compositional
structure, I dene a world in which the waveforms evolve. A world consists of
multiple distinct environments that change over time.
For a given piece, the world will be characterized by some number of locations.
These locations may be mapped spatially onto speakers. The environment at
each location will initially be dened by some target waveform and some set of
mutation probabilities. Immediately after an individual is created, it has a slight
chance of moving to another location. If it migrates, it will pan from one speaker
to the other over the course of its playback. It will be considered to be in the
second location for its entire duration and will have its tness determined there.
It will be given the opportunity to reproduce in the second location, but not
the rst. In this way, sounds with new characteristics will enter each location,
enhancing biodiversity.
The world will be characterized by probabilities of change. Both target wave-
form and mutation probabilities can change whenever a new waveform is created.
There are two sorts of changes that environments can undergo. One is the slow
drift that is seen in ice ages: these take place over an enormous amount of time
from the perspective of individuals but happen many times over the evolution of
a species. This is simulated by slowly cross-fading between two target waveforms.
The other is the drastic change that results from catastrophic events, such as
re decimating a forest, causing it to be replaced by grassland. This is achieved
by replacing the target waveform with a completely dierent waveform.
The changing environment prevents the population from strongly resembling
the target waveform. The goal is to present the process, not draw attention to
the underlying environment. Catastrophic environmental changes lead to musical
surprises that reveal subsets of the population that were previously too unt to
be heard above the dominant sounds. Migration can have similar eects; it also
increases biodiversity, which means there are always sounds in each location that
can take advantage of the changing environment.
Evolutionary Musique Concrete 693
4 Results
4.1 General Description of Output
As evolution occurs, all of the waveforms in the population are written to a
single sound le with each individual waveform weighted by its tness. This
weighting causes t individuals to rise to prominence. Each time a waveform
ends, a new individual is generated from the population. The new individuals
playback begins immediately at the end of the waveform it replaces. Because
the initial biodiversity is very high, the beginning of the output le is a wash
of textures reminiscent of the timbres of the initial population. Within a few
generations, a few t individuals dominate the mix, causing a sound in which
particular features of the initial population can be identied.
As evolution progresses, qualities of the initial population are preserved but
are increasingly transformed through reproduction and mutation as the popu-
lation takes on properties of the target waveform. The similarity to the target
waveform depends on the type of mutation used, on the probability of mutation,
and on the amount of time over which evolution occurs.
4.2 Biodiversity
In order for a piece to be musically interesting, biodiversity must be maintained.
Since output is weighted by tness, only t sounds are heard distinctly. The
truly musical moments occur when previously unt sounds become t, either
through a changing environment or migration. Novel sounds bloom out of the
sea of sounds and aect what is heard after they become t.
5 Conclusion
I have used this algorithm to produce several pieces and an installation that have
been performed and well received.2 Many listeners have expressed surprise that
the pieces were algorithmically generated with no composer intervention beyond
setting initial conditions. This speaks to the algorithms ecacy in producing
novel and pleasing musical results. Depending on the source sounds and initial
probabilities that I choose, I can generate very dierent pieces that share the
characteristic sound of the algorithm. Over the course of a typical piece, sounds
from the initial population slowly evolve. Rhythms change gradually; dierent
sounds from the initial population rise to prominence at dierent points; and
the piece has clear directionality, punctuated by occasional musical surprises.
1
See http://cmagnus.com/cmagnus/ga_results.shtml for sample output.
2
See http://cmagnus.com/cmagnus/comp/gasketch.shtml for a short piece.
Evolutionary Musique Concrete 695
References
1. Griths, P.: A Guide to Electronic Music. Thames and Hudson (1979)
2. Burton, A.R., Vladimirova, T.: Generation of musical sequences with genetic tech-
nique. Computer Music Journal 23(4) (1999) 5973
3. Johnson, C.G.: Exploring the sound-space of synthesis algorithms using interactive
genetic algorithms. In Wiggins, G.A., ed.: Proceedings of the AISB Workshop on
Articial Intelligence and Musical Creativity, Edinburgh (1999)
4. Johnson, C.G.: Exploring sound-space with interactive genetic algorithms. Leonardo
36(1) (2003) 5154
5. Dahlstedt, P.: Creating and exploring hute parameter spaces: Interactive evolution
as a tool for sound composition. Proceedings of the International Computer Music
Conference (2001) 235242
6. Berry, R., Dahlstedt, P.: Articial life: Why should musicians bother? Contemporary
Music Review 22(3) (2003) 5767
7. Horner, A., Beachamp, J., Packard, N.: Timbre breeding. Proceedings of the Inter-
national Computer Music Conference (1993) 396398
A Connectionist Architecture for the Evolution
of Rhythms
1 Introduction
The early applications of evolutionary computation to music go back to 1991 with
the works of Horner and Goldberg by applying genetic algorithms to thematic
bridging [1]. Since then there have been many successful attempts to apply these
techniques to music. For a discussion on the history and achievements genetic
algorithms please refer to Gartland-Jones and Copley [2].
Neural Networks have also been used extensively in the context of music.
There have been connectionist models for pitch perception, rhythm and me-
tre perception, melody conduction and composition, many of them collected in
Grith and Todds book [3].
Memetic theory, the cultural counterpart of biological evolution, was invented
by Dawkins in 1979 [4], and postulates that culture is an evolutionary process
evolving through the exchange, mutation and recombination of units of informa-
tion that can be observed in dierent scales. Although the denition of a meme
is still quite obscure, there have been some computational attempts to model
the evolution of musical style according to this theory [5].
In the specic case of rhythm composition, we can nd applications of evo-
lutionary computation such as the Interactive Genetic Algorithm (IGA)from
Horowitz [6] to breed drum measure loops, the CONGA system from Tokui and
Iba [7] using genetic algorithms and genetic programming to generate rhythms
which are evaluated by the user, and the creation of rhythms with cellular au-
tomata by Brown [8].
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 696706, 2006.
c Springer-Verlag Berlin Heidelberg 2006
A Connectionist Architecture for the Evolution of Rhythms 697
All these methods have been developed mainly with three applications in
mind: Sound synthesis, composition, and musicology [9]. This paper focuses on
the later; i.e., a framework for the study the evolution of music.
agents and how the repertoire evolves in the continuous search to generate music
that the other agent will recognise in his internal categories system.
In this paper we introduce the groundwork that characterises our approach;
i.e., the connectionist nature of the agents mechanism for representing rhythms.
3 Agents Architecture
We will present the architecture an agent containing two neural networks in cas-
cade that receive a stream of rhythmic events as input and contain three output
neurons that map these rhythms into a tridimensional space. For a comprehen-
sive foundation on neural network theory please refer to Haykins book [18].
Each agent is provided with a set of two neural networks: a SARDNET and
a one layer Perceptron (Figs 2 and 5). The rst one receives the stimulus se-
quentially from an input, encoded as a MIDI stream of rhythmic events, and
generates an activation pattern corresponding to the agents perception of the
type of event and its place in the sequence. The dynamics of this network is fully
explained in Sec. 3.1. The pattern of activation from the Sardnet then becomes
the input of the later network, the Perceptron, which generates three output
values that enable the categorisation of the received sequences. The architecture
and learning rules of the Perceptron are explained in Sec. 3.2.
The events are represented as vectors with three components. The rst com-
ponent denes the musical instrument (timbre), the second denes the loudness
(velocity), and the third denes the value in milliseconds that the sound lasts
(Inter-onset interval). These three dimensions correspond to human perceptual
attributes with dierent scales in sensitivity and range. Modelling these dier-
ences in the learning algorithm was not part of the scope of this paper.
3.1 Sardnet
The SARDNET [19] is a self-organising neural network for sequence classication
that was applied in phonology and recently it was also applied to simulations
for evolving melodies [20]. This network is an extension of the original Self Or-
ganised Map (SOM) which is a neural network used for unsupervised learning
developed by Kohonen [21]. The SOM has proven to be a powerful tool for many
engineering applications and some of its variations have provided explanations
for the organisation and development of the visual cortex [22].
The SOM is also called a competitive network or winner-takes-all net, since
the node with largest input wins all the activation, which reects on the pos-
sibility of updating that unit in order to become more similar to the input.
The neighbouring units of the winning neuron are also updated according to
a neighbourhood function that organises representations of similar stimuli in a
topographically close manner.
In Fig. 1 we can see a diagram of a SOM with 16 units and one input. The
dimension of the input vector determines the dimension of the weights vector of
each unit. To determine which weight vector is the closest one to the input unit,
the euclidean distance is calculated:
A Connectionist Architecture for the Evolution of Rhythms 699
W1 W2
V
(2,1)
W W
13 14
W14 = (1.5,1)
d2 (V,W14 ) = 0.5
v1
v2
W1 W2
V j-1
Vt
V t+1
W W
13 14
vT-1
vT
n
d2 (v, w) = |vi wi |2 (1)
i=1
The SARDNET keeps some essential features from the SOM, but adds two
important features that enables us to deal with sequences of events. The rst
diverging characteristic is that the winning neuron is removed from subsequent
competitions, and the second dierence corresponds to holding the previous ac-
700 J.M. Martins and E.R. Miranda
tivations with a decay in each time step. The dynamics of SARDNET is shown
on Fig. 2 where we can observe a the stream of events passing through the input
and activating three units in sequence (W14 , W7 , W2 ). The training algorithm
for the SARDNET is shown on Tab. 1.
INITIALIZATION:
Clear all map nodes to zero
MAIN LOOP:
While not end of sequence
1. Find inactive weight vector that best matches the input.
2. Assign 1.0 activation to that unit.
3. Adjust weight vectors of the nodes in the neighbourhood.
4. Exclude the winning unit from subsequent competitions.
5. Decrement activation values for all other active nodes.
RESULT:
Sequence representation = activated nodes ordered by activation values.
Like the SOM, the SARDNET uses the Euclidean distance d2 (w, v) from Eq.
1 to evaluate which is the weight that better matches the input. On step 3 of the
algorithm the weight of the winning and the neighbourhood units are changed
according to the standard rule of adaptation:
where depends also on the distance to the winning unit, meaning its position in
the neighbourhood. The neighbourhood function decreases as the map becomes
more organised.
As in step 5 of the algorithm, all the active units are decayed proportionally
to the decay parameter d,
In the following section we present the details of the Perceptron, the network
that receives the activation patterns from the SARDNET, keeping the relevant
information about this activation patterns across several sequences.
3.2 Perceptron
O1 O2 O3
I1 I2 I3 IN
Oi = g(hi ) = g wik Ik (4)
k
Eq. 4 is the propagation function of the Perceptron and g(h) in Eq. 5 is
the activation function computed by the units. In this case this function is a
sigmoidal function,
1
Oi = g(hi ) = (5)
1 + exp(hi )
The Perceptron uses the gradient descendant method to change the weights
in order to adjust the test input to a given target.
wjk = (Tk Ok )Ij ; (6)
where is the learning rate, T is the target value and Tk Ok is the corresponding
error during the training phase.
The number of inputs of the Perceptron is the number of units of the SARD-
NET. The number of output neurons is arbitrarily dened as being 3 to be able
to visualise the results in a tridimensional grid. This output grid enables the
categorisation of the input sequences.
702 J.M. Martins and E.R. Miranda
v1
v2
W 1
W2
V t-1
Vt
Categorisation
V t+1
network
W W
13 14
vT-1
vT
4.1 Sardnet
First we trained the Sardnet solely with prerecorded rhythms. We used a map
with 50 elements, 10 in the length and 5 in the breadth, a learning rate of 0.1.
The map was initialised with random weights in the range of -1 to 1. To per-
form the rst organisation tasks the map was fed with 5 sequences of rhythms of
latin music, each of them containing one or two instruments, very much like it
would be if these were performed by other agents. After a couple of iterations a
pattern of organisation could already be observed in the network, but the corre-
spondent sequences extracted sounded extremely chaotic. After 50 iterations the
rhythms start to sound organised as well, and the changes to the timbre of the
instrument have the largest perceptual impact. This was expected to be so, as
there is no discrimination in the organisation algorithm regarding the dierent
weight components. Nevertheless, the organisation process is ne tuned enough
to adapt perceptually perfectly to the incoming sequence after 80 iterations, and
a learning musician is also expected to make timbre mistakes.
The graphs from Fig. 6 show the evolution of the third component of the
weights (Inter-onset Intervals). The rst graph shows the initial value of the
weights, as explained above, the second shows the organisation process after 20
iterations, and the third shows the weights stabilised after 80 iterations. Fig.
6 d) shows the dierence between the sums of the weights in two consecutive
iterations, this being a measure of the stabilisation of the weights.
Previously it was stated that the SOM adapts its weights, not only for the
winning elements, but also in its neighbourhood. In Fig. 7 it is shown the same
organisation process but considering the neighbourhood change. The parameter
controls the range of the the gaussian that changes the neighbourhood. By
using an initial value of = 2.97 we can more rapidly capture the global charac-
A Connectionist Architecture for the Evolution of Rhythms 703
700 700
600 600
Interontset interval
Interontset interval
500 500
400 400
300 300
200 200
5 5
100 100
4 4
0 3 0 3
10 2 10 2
8 8
6 1 6 1
4 4
2 0 2 0
0 0
1500
700
1000
Consecutive change
600
Interontset interval
500
400
300
500
200
5
100
4
0 3
10 2
8
6 1
4 0
2 0
0 0 20 40 60 80 100
iterations
teristics of the input. It is necessary to reduce gradually this value in order not to
destroy the representations of the events that occur less frequently. Comparing
Figs. 6d) and 7d) we see that this procedure accelerates the convergence process.
One of the most important conclusions is that although it is possible to extract
very similar sequences from both maps, the internal representation can be quite
dierent, as can be seen from both Figs. 6 and 7 both trained with the same
sequences.
4.2 Perceptron
The Perceptrons architecture is explained in Sec. 3.2. The Perceptron used for
these experiments had 50 input units, that receive their values directly from
the activations of the output layer of the Sardnet. These input units are fully
connected to 3 output neurons enabling the mapping and categorisation of the
input sequences into a tridimensional space of straightforward visualisation. We
chose the rst three activation layers of 50 elements corresponding to three
rhythms fed previously to the Sardnet, and trained the Perceptron to respond to
these patterns with three dierent targets, namely [1, 0, 0], [0, 1, 0] [0, 0, 1]. This
process took 434 epochs to reach an error of categorisation of 103 as can be seen
in Fig. 8 a). Each training patterns is marked with an (o) in the categorisation
space (Fig. 8 b)). Later, we fed the perceptron with the last two rhythms and
observed its activation marked with an (x). These were found to be much closer
to the [0, 1, 0] target, which interestingly correspond to the most similar pattern
regarding the IOIs.
704 J.M. Martins and E.R. Miranda
700 700
600 600
Interontset interval
Interontset interval
500 500
400 400
300 300
200 200
5 5
100 100
4 4
0 3 0 3
10 2 10 2
8 8
6 1 6 1
4 4
2 0 2 0
0 0
1800
1600
1400
700
1200
Consecutive change
600
1000
Interontset interval
500
800
400
300 600
200 400
5
100
4 200
0 3
10 2 0
8
6 1
4 200
2 0
0 0 20 40 60 80 100
iterations
1
10 1.2
TrainingBlack GoalGray
0.8
2
10 0.6
z
0.4
0.2
3
10 0
0 0
0.5 0.5
4
10 1 1
0 50 100 150 200 250 300 350 400 x
434 Epochs y
5 Conclusion
With this paper we presented the architecture of an interactive virtual agent that
is able to learn rhythms. The agent is composed of two neural networks that are
able to learn the rhythms representation through self-organising processes. As it
happens with humans, the agents always have dierent internal representations
for the rhythms they listen to. Furthermore, the output of the networks cate-
gorises the incoming sequences and provides a measurement for the agents to
judge how related are the listened rhythms. The rhythm representation allows
A Connectionist Architecture for the Evolution of Rhythms 705
References
1. Horner, A., Goldberg, D.: Genetic algorithms and computer-assisted music com-
position. In: Proceedings of the Fourth International Conference on Genetic Algo-
rithms, San Mateo, CA, Morgan Kauman (1991)
2. Gratland-Jones, A., Copley, P.: The suitability of genetic algorithms for musical
composition. Contemporary Music Review 22(3) (2003) 4355
3. Grith, N., Todd, P.: Musical Networks. MIT-Press, Cambridge:USA (1999)
4. Blackmore, S.: The Meme Machine. Oxford University Press (1999)
5. Gimenes, M., Miranda, E.R., Johnson, C.: A memetic approach to the evolution
of rhythms in a society of software agents. In: Proceedings of the 10th Brazilian
Symposium of Musical Computation (SBCM), Belo Horizonte (Brazil) (2005)
6. Horowitz, D.: Generating rhythms with genetic algorithms. In Anderson, P.,
Warwick, K., eds.: Proceedings of the International Computer Music Conference,
Aarhus(Denmark), International Computer Music Association (1994)
7. Tokui, N., Iba, H.: Music composition with interactive evolutionary computation.
In: Proc. 3rd International Conf. on Generative Art, Milan, Italy (2000)
8. Brown, A.R.: Exploring rhythmic automata. In Rothlauf, F., et al., eds.: Pro-
ceedings of the 3rd European Workshop on Evolutionary Music and Art, Lau-
sanne(Swizerland), Springer Verlag (2005)
9. Coutinho, E., Gimenes, M., Martins, J., Miranda, E.R.: Computational musicology:
An articial life approach. In: Proceedings of the 2nd Portuguese Workshop on
Articial Life and Evolutionary Algorithms Workshop, Covilha(Portugal), Springer
Verlag (2005)
10. Miranda, E., Todd, P.: A-life and musical composition: A brief survey. In: Pro-
ceedings of the IX Brazilian Symposium on Computer Music, Campinas,(Brazil)
(2003)
11. Steels, L.: The synthetic modeling of language origins. Evolution of Communication
1(1) (1997) 134
12. Werner, G., Todd, P.: Too many love songs: Sexual selection and the evolution of
communication. In Husbands, P., Harvey, I., eds.: ECAL97, Cambridge, MA, MIT
Press (1997) 434443
13. Miranda, E.R.: Emergent sound repertoires in virtual societies. Computer Music
Journal (MIT Press) 26(2) (2002) 7790
14. Wittgenstein, L.: Philosophical Investigations. Blackwell Publishers (1979)
15. Steels, L.: A self-organizing spatial vocabulary. Articial Life 2(3) (1995) 319332
16. de Boer, B.: Self-Organisation in Vowel Systems. PhD thesis, Vrije Universiteit
Brussel AI-lab (1999)
17. Miranda, E.R.: Mimetic development of intonation. In: Proceedings of the 2nd In-
ternational Conference on Music and Articial Intelligence (ICMAI 2002), Springer
Verlag - Lecture Notes on Articial Intelligence (2002)
18. Haykin, S.: Neural Networks. Prentice Hall, New Jersey:USA (1999)
706 J.M. Martins and E.R. Miranda
19. James, D.L., Miikkulainen, R.: SARDNET: a self-organizing feature map for se-
quences. In Tesauro, G., Touretzky, D., Leen, T., eds.: Advances in Neural Infor-
mation Processing Systems 7, Cambridge, MA, USA, MIT Press (1995) 57784
20. Bosma, M.: Musicology in a virtual world: A bottom up approach to the study of
musical evolution. Masters thesis, University of Groningen (2005)
21. Kohonen, T.: Self-Organizing Maps. Springer Series in Information Sciences.
Springer-Verlag Berlin and Heidelberg GmbH & Co. K (1997)
22. Bednar, J.A., Miikkulainen, R.: Joint maps for orientation, eye, and direction
preference in a self-organizing model of V1. Neurocomputing (in press) (2006)
23. Rosenblatt, F.: Principles of neurodynamics: Perceptrons and the theory of brain
mechanisms. Spartan Books, Washington (1962)
MovieGene: Evolutionary Video Production
Based on Genetic Algorithms and Cinematic
Properties
1 Objectives
The broader goal of this research is to nd new ways of editing and producing
multimedia documents. In our approach, the main objectives are: (1) To use
Evolutionary Computation for the creation of a new paradigm for multimedia
production; (2) To develop annotation mechanisms enabling tness evaluation
of a set of video clips; (3) To use the system in an interactive way, so that the
evolutionary process may be aected by the user.
The system that we are proposing, MovieGene, is a multimedia production
system, that uses genetic algorithms[1] and user selection, as a way to evolve
a population of video segments. These segments are previously annotated with
metadata (MPEG-7 [2] descriptors), that is used in the selection process. The
user actions may inuence the evolutionary process as a selection operator. Con-
cepts inspired in video editing techniques are used to assemble the resulting
videos.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 707711, 2006.
c Springer-Verlag Berlin Heidelberg 2006
708 N.A.C. Henriques et al.
3 Fitness Function
The tness function f is formulated (equation 1) as the sum of all distances be-
tween individual and goal genes. For each gene (segment) value, the descriptors
distance weighted sum is calculated: the similarity matching of GoFGoPColor (C)
descriptors of a segment and the specied goal colors; the KeywordAnnotation
(K) similarity matching using a proposed algorithm for distance computation
[3]; the FreeTextAnnotation (F ) similarity matching using a developed [3] hy-
brid algorithm, that uses the Levenshtein1 algorithm; and an ad hoc similarity
matching function for the ShotDistance (S) proposed [3] descriptor.
1
http://www.nist.gov/dads/HTML/Levenshtein.html
MovieGene: Evolutionary Video Production 709
The equations for the distance measurement between each individuals de-
scriptor, at some segment, and the goal (purpose to reach) are of the generic
form Z = d(Zi , Zgoal ). The formula [2] for matching and measuring the distance
between two distinct GoFGoPColor descriptors, G and G , is C = n |Gn Gn |,
where C = d(G, G ). The number of coecients for the color histogram is repre-
sented by n. As mentioned, the metrics for free text similarity F = d(s1 , s2 ) and
for the keywords distance K = d(s1 , s2 ) are calculated with proposed algorithms
[3]. The camera shots can be classied, accordingly to the distance between the
camera and the subjects as combinations of close, medium and long shots. The
S = d(s1 , s2 ) values range between 0.0 (close-up) and 1.0 (long-shot).
Let I set include all the individuals i of the current generation step: f : I
[0, 1]. Let Vi be the set of an individuals video segments with descriptions, wd
the weight for the specic descriptor d, g the number of genes/segments per
individual, and fi (Vi ) the tness function applied to all those segments:
1
fi (Vi ) = wC C(v) + wK K(v) + wF F (v) + wS S(v) , iI (1)
g
vVi
4 MovieGene System
Repository. The container for the original multimedia documents and also for
the new produced documents along with the respective media descriptors.
In the initial screen the user can introduce the intended characteristics for the
nal document, including the semantic description, the shot type, and the color
histogram, using text boxes, histogram sliders and buttons. Genetic parameters
are the probabilities of selection for crossing over and mutation, the percentage of
elitism, and the number of generations. At each step of the evolutionary loop, the
resulting videos are presented (gure 3). The user can eliminate a specic video,
hit for the next round, or go towards process completion, when the resulting
(best) video can be played.
Several unattended tests were done with preset genetic parameters: selection
for crossover probability to pS = 0.5, mutation probability to pM = 0.01, elitism
to 10%. The selection method used was 2 Tournament and combination was
One-Point crossover. The default weights were: wC = 0.1, wF = 0.2, wK = 0.6
and wS = 0.1. The tests showed that the results closely match the goals dened
by the user in combining video segments and that the genetic algorithm can
be used interactively, taking less than a second to present the results of each
generation to the user.
cinema editing properties. The evolutionary process aims at best clips editing,
according to a tness function based on distance metrics for color, camera and
textual annotation descriptors.
Currently, MovieGene allows for the accommodation of spatial, graphical, and
rhythmic continuity editing rules, mainly through the support of syntactic prop-
erties like color and shot distance, and of additional editing rules that may rely
on more semantic properties. Although it may perform automatic video editing,
MovieGene intends to empower the user as a lm editor, supporting the creative
edition by proposing innovative evolutionary combinations the user may sub-
jectively select from, in the process of arriving at more satisfactory or artistic
solutions. Some of the improvements in this direction include the provision of a
more abstract, rich, and exible interface, that does not require the user to be
aware of low level video descriptors and genetic algorithms terminology.
References
1. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley Longman Publishing Co., Inc. (1989)
2. Manjunath, B., Salembier, P., Sikora, T.: Introduction to MPEG-7 Multimedia
Content Description Interface. John Wiley & Sons, Ltd, West Sussex, EN (2002)
3. Henriques, N.A.C.: MovieGene: A multimedia production system using evolution-
ary computation. Masters thesis, Faculty of Sciences and Technology of the New
University of Lisbon (2005)
Audible Convergence for Optimal Base Melody
Extension with Statistical Genre-Specific
Interval Distance Evaluation
Ronald Hochreiter
1 Introduction
With the progress of computers, various compositional methods were converted
to computational algorithms, and composers started to apply more and more
(especially stochastic) methods to achieve an automatic generation of music.
See [1] for an overview of methods used in the past.
Recently, Operations Research methods have been applied to generate opti-
mal melody lines, e.g. in [2] combinatorial optimization methods were proposed.
Evolutionary algorithms are also a valuable approach for the generation of music,
see e.g. [3], [4], [5], and the references therein. The main problem with optimiza-
tion approaches in composition is the denition of the optimum. An intuitive
approach goes as follows: the main objectives are specied by the composer, and
constraints are implied by some given set of compositional rules. These rules de-
pend on the respective area of composition. The decision process for automatic
composition is two-fold. First, a mapping of compositional rules to a numerical
model, which is suitable for automatic optimization has to be dened. After the
optimization has been conducted, a re-mapping from the numerical optimization
result to a musical piece has to be applied.
This paper is organized as follows. Section 2 provides a short review of how
horizontal tension of melodies is handled numerically. Section 3 presents details
of the evolutionary algorithm, which was developed to optimize melodies. Com-
bining intermediate steps of the optimization process to a musical piece leads
to the eect of audible convergence, which is exemplied in Section 4. Section 5
concludes the paper.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 712716, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Audible Convergence for Optimal Base Melody Extension 713
3 Evolutionary Algorithms
We use a standard genetic algorithm adapted from [7]. Every member of our
population has a numeric representation (genotype) and an audible representa-
tion (phenotype). Each chromosome of the population consists of melody data,
i.e. one integer per note. To evaluate the tness of one chromosome, a set of in-
terval categories, as shown above, has to be set up. Furthermore, a target mean
and variance structure for each bar is necessary. The target mean and variance
values should be in line with the values assigned to each interval category. The
tness f of a generated melody vector is evaluated by calculating weighted sum
of the distance between the pre-specied target means i and target variances
i2 , i.e.
|b|
f= m i , Mean(bi ) + v i2 , Variance(bi )
i=1
where |b| is the number of bars (or bar sets), while m and v can be used to
adjust the importance of the mean or the variance. The functions Mean(bi ) and
Variance(bi ) return the mean and variance of the interval categories in bar (or
bar set) bi (i = 1, . . . , |b|). Dierent distances can be used.
In [4] ten mutation operators for melody lines are presented, and applied to
the MusicBlox project. For the algorithm described in this paper, the following
operators are applicable, due to xed note durations: Swap two adjacent notes,
transpose a note pitch by a random interval, transpose a note pitch by an octave,
and reverse a group of notes within a randomly selected start and end point.
The algorithm was implemented in MatLab 7. Each generated melody is con-
verted to the GNU LilyPond format, which enables visualization and audibility,
as both the score and a MIDI le is generated.
4 Audible Convergence
We aim at constructing an optimization process, which leads to audible conver-
gence, by listening to intermediate steps of the evolutionary algorithm. We start
with some random melody, which will be iteratively optimized into a consonant
melody, given a subjective set of consonance rules - represented by interval cate-
gories. To avoid a pure random noise generation without any musical substance,
an optimal extension of some base melody has been used. Assume that some base
melody m, which should be extended, consists of nm notes. Each bar of the nal
extended melody consists of one note of the base melody and ne notes of extended
melody, such that the nal melody has n = nm (ne + 1) notes. These extension
Audible Convergence for Optimal Base Melody Extension 715
26 55
Classical Classical
Jazz Jazz
24
50
22
45
20
18 40
16
35
14
30
12
10 25
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Iteration
notes will be randomly chosen at the beginning and are meant to converge nu-
merically, and thus also audibly, during iterations of the evolutionary algorithm.
Consider the following example. The rst nine notes of Ludwig van
Beethovens Fur Elise (Bagatelle in A minor for solo piano (1808), WoO 59)
are used as the base melody, i.e. m = (52, 51, 52, 51, 52, 47, 50, 48, 45). Let the
melody extension structure in common time be one quarter of the base melody
followed by six eights of melody extension. Then the nal melody will consist
of nine bars with a total of 9 7 notes, i.e. one base note and six extension
notes per bar. A simplied interval category scheme was used. The target mean
and variance for the rst and last three bars is 1(0.5), and for the three bars in
the middle 3(2) has been chosen. This target value structure implies a rather
consonant beginning and ending, and a more dissonant middle part.
The initial population has been generated by creating melody lines, where
each extension note is modied by adding an uniform-randomly chosen interval
in the range of [5, 5] half steps relative to its base note. The last base note
has to be replicated at the end of the score, which is necessary to calculate the
tness of the last bar accordingly.
For evaluating the (negative) tness, which has to be minimized, the Eu-
clidean distance was chosen. The size of the initial population was set to 30.
Within each of 20 iterations, the 10 best of the previous population have been
added to the new population. 10 new melodies have been added by mutating
ve notes of the best melody of the previous population and 10 by mutating ten
notes of the second best melody.
The convergence of the tness value is depicted in Figure 2 for a equally
weighted mean and variance (m = v = 1, left), as well as m = 3 and v = 1
(right). The audible convergence cannot be shown in this paper satisfactorily, as
the size of nal scores tends to grow to several pages.
5 Conclusion
In this paper, an evolutionary algorithm to calculate optimal extensions of base
melody lines was presented. The algorithm applies a minimization of the sum of
716 R. Hochreiter
distances between pre-dened target means and target variances and the inter-
vals (interval categories) of the melody. Using an evolutionary algorithm reveals
the eect of audible convergence, when iterations of the optimization process
are combined to a musical piece. Using dierent interval category schemes for
dierent musical genres results in dierent audible convergence structures, which
lead to musically interesting results. A motivating example was conducted and
summarized. This basic algorithm, which contains a set of musical simplica-
tions, can be further rened to obtain even more audibly convergent results.
This paper provides the basis for such eorts.
References
1. McAlpine, K., Miranda, E., Hoggar, S.: Making music with algorithms. a case study
system. Computer Music Journal 23 (1999) 1930
2. Schell, D.: Optimality in musical melodies and harmonic progressions: The travelling
musician. European Journal of Operations Research 140 (2002) 354372
3. Gartland-Jones, A.: Can a genetic algorithm think like a composer? 5th Interna-
tional Conference on Generative Art. Milan, Italy. (2002)
4. Gartland-Jones, A.: MusicBlox: A real-time algorithmic composition system
incorporating a distributed interactive genetic algorithm. In Cagnoni, S.,
Romero Cardalda, J., Corne, D., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.,
Marchiori, E., Meyer, J.A., Middendorf, M., Raidl, G., eds.: Applications of Evo-
lutionary Computing: EvoWorkshops 2003. Volume 2611 of Lecture Notes in Com-
puter Science., Springer (2003) 490501
5. Manaris, B., Vaughan, D., Wagner, C., Romero, J., Davis, R.: Evolutionary music
and the Zipf-Mandelbrot law: Developing tness functions for pleasant music. In
Cagnoni, S., Romero Cardalda, J., Corne, D., Gottlieb, J., Guillot, A., Hart, E.,
Johnson, C., Marchiori, E., Meyer, J.A., Middendorf, M., Raidl, G., eds.: Appli-
cations of Evolutionary Computing: EvoWorkshops 2003. Volume 2611 of Lecture
Notes in Computer Science., Springer (2003) 522534
6. Haunschild, F.: Die Neue Harmonielehre, Teil I. AMA Verlag (1998)
7. Blum, C., Roli, A.: Metaheuristics in combinatorial optimization: Overview and
conceptual comparison. ACM Computing Surveys 35(3) (2003) 268308
A Two-Stage Autonomous Evolutionary Music
Composer
1 Introduction
In [1], Gartland-Johnes and Colpey distinct between two important objectives of search
algorithms; exploration and optimization. Search algorithms, such as Genetic and Evo-
lutionary algorithms, in creative applications definitely serve the formal objective.
An excellent review of the application of Genetic Algorithms in musical composi-
tion is provided in [2]. However, as stated in [3], most evolutionary composition sys-
tems listed in literature need a tutor, or a human evaluator in an interactive GA envi-
ronment. The development of autonomous unsupervised music composers that pos-
sess automatic fitness assessment is still limited. Furthermore, the concept of compos-
ing music based on a library of motives is, however, near or perhaps slightly beyond
the frontier of current capabilities of artificial Intelligence (AI) technology. Thus, this
area of research spearheads a new direction in automated composition. For that, design-
ers of evolutionary music requires new techniques that focus on creating classes of
musical potential, as opposed to existing techniques that describe predicted linear out-
comes. That should lead to an examination of the dynamic interaction between aspects
of musical system [4]. The work presented in this paper is an attempt in that direction.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 717 721, 2006.
Springer-Verlag Berlin Heidelberg 2006
718 Y. Khalifa and R. Foster
note of the phrase. At the end, a combination of ABA#A is produced, which is one of
the common combinations in music composition theory.
3 Stage I
In Stage one, motifs are generated. A table of 16 best motives is constructed to be
used in Stage two. These motifs will be used both in their current and transposed
locations to generate musical phrases in Stage two. Figure 1, shows the chromosome
structure in Stage I. The genetic structure of one gene, of the chromosome used in
Stage I, is shown in Figure 1.
Gene 1 2
At the end of Stage I, a table of the top 16 motifs is constructed. Each row in the
look-up table represents a motif. The columns represent the different notes in the
motif. Although all motifs generated are one whole note in duration, they could be
composed of either one, two, four, six, or eight notes.
In Stage I, only an Intervals Evaluation Function was used. Within a melody line
there are acceptable and unacceptable jumps between notes. Any jump between two
successive notes can be measured as a positive or negative slope. Certain slopes are
acceptable, others are not. The following types of slopes are adopted: Step: a differ-
ence of 1 or 2 half steps. This is an acceptable transition. Skip: a difference of 3 or 4
half steps. This is an acceptable transition. Acceptable Leap: a difference of 5, 6, or 7
half steps. This transition must be resolved properly with a third note. Unacceptable
Leap: a difference greater than 7 half steps. This is unacceptable.
If a leap is acceptable and resolves properly, no penalty will be assigned. There is
also a possibility of bonus within the interval section. Certain resolutions between
A Two-Stage Autonomous Evolutionary Music Composer 719
notes are pleasant to hear. We can define these bonus resolutions as the 12-to-13 and
the 6-to-5 resolutions. The first is a stronger resolution, and therefore receives a lar-
ger weight. Thus the bonuses are calculated as in equations (1) and (2).
4 Stage II
In Stage II, motifs from the look-up table constructed in stage I are combined to form
two phrases A, and B. Each phrase is eight measures, and each measure is one whole-
note motif, Figure 2.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 1 5 3 1 4 ... ... ... ... ... ... ... ... ...
Phrase A Phrase B
In stage II, two evaluation functions are implemented; Intervals, and ratio. The Inter-
val evaluation function described in the previous section is used to evaluate interval
relationships between connecting notes among motifs. Other evaluation function is
described below.
Ratios Evaluation Function. The basic idea for the ratios section of the fitness func-
tion is that a good melody contains a specific ideal ratio of notes, and any deviation
from that ideal results in a penalty. There are three categories of notes; the tonal cen-
ters that make up the chords within a key, the color notes which are the remaining
notes within a key, and chromatic notes which are all notes outside a key. Each type
of note is given a different weight based on how much a deviation in that portion of
the ratio would affect sound quality. The arbitrary ratios sough were: Tonal Centers
make up 60% of the melody; Color Notes make up 35% of the melody; and Chro-
matic Notes make up 5% of the melody. Although these ratios choices could be quite
controversial, they were a starting point and current ongoing work is looking further
into making these ratios chosen by the user or music style dependent.
720 Y. Khalifa and R. Foster
5 Results
The four motifs in Figure 3 (a) to (d) were picked from the final 16 motifs chosen by
the program. It can be observed that each motif has an identical rhythm consisting of
four eighth-notes, one quarter-note, and two more eighth notes.
Examining motif a, the first three notes are all F#s, indicating that no penalty will
be assigned (a step size of 0). The next note is a G#, (2 half-steps away from F#).
This transition is classified as a step and no penalty is assigned. The following notes
are F#, G#, and E (a difference of 2, 2, and 3 half steps, respectively). These transitions
are also acceptable.
Since each of the motifs in Figure 3 (a) to (d) contained an identical rhythm, it is of
no surprise that a piece composed from these motifs contain the same rhythm. What
is interesting to note, however, are the measures marked as I, II, III, and IV.
(a)
(b)
(c)
(e)
(d)
Fig. 3. (a) - (d) Sample motifs generated in Stage I of the Evolutionary Music Composer. (e) A
partial piece composed from motifs in the same generation as those in (a) through (d).
A Two-Stage Autonomous Evolutionary Music Composer 721
Measure I and III are the only measures throughout the entire excerpt in which the
last two eighth-notes are not G# and E. Measures II and IV are the only ones in which
the first three eighth-notes are not all F#s. The last note of measure I and the first note
of measure II are the same. This is the result of the intervals evaluation function, since
its role in Stage II is to evaluate the transitions between motifs.
References
1. Gartland-Jones, A. Copley, P.: What Aspects of Musical Creativity are Sympathetic to
Evolutionary Modeling, Contemporary Music Review Special Issue: Evolutionary Models
of Music, Vol. 22, No. 3, (2003), pages 43-55.
2. Burton, A.R. and Vladimirova, T.: Generation of Musical Sequences with Genetic Tech-
niques, Computer Music Journal, Vol. 23, No. 4, (1999), pp 59-73.
3. Miranda, E.R.: At the Crossroads of Evolutionary Computation and Music: Self-
Programming Synthesizers, Swarm Orchestra and Origins of Melody, Evolutionary Com-
putation, Vol. 12, No. 2, (2004), pp. 137-158.
4. Miranda, E.R., Composing Music Using Computers, Focal Press, Oxford, UK, (2001).
Layered Genetical Algorithms Evolving into
Musical Accompaniment Generation
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 722726, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Layered GA Evolving into Musical Accompaniment Generation 723
that, somehow, depended on an interactive GA (IGA); [9] and [7] eliminated the
tness function phase.
The problems related to the GAs that lead researchers to use IGAs, or any
alternative algorithm, are closely related to the subjectiveness of modeling a
tness function, which is, in turn, one of the consequences of the hardness of
modeling a tness function. Since a composer changes its set of musical rules
even within the same music, the GA may work well untill a certain point , but,
with the increase of the music complexity, it becomes hard to trust in a single
tness function.
Here we present a theoretical proposal of an alternative implementation of
tness functions in the direction of counterpart the musicians subjectivity with
a layered control of tness functions, which we will call meta fitness function.
We begin in the Sect. 2 with a overview of the system. In the Sect. 3 we model
the basic structures present in the system. In the next step (Sect. 4) we talk
about the tness adaptation layer and introduce the meta tness function. Then
we make a discussion (Sect. 5) and, nally, the conclusion (Sect. 6).
2 Overview
Look at Fig. 1, where we show the parts of the system. The external performer
inserts a musical data stream. This stream is passed to the tness adaptation
algorithm, which uses it to modify a tness function. This modied tness func-
tion is delivered to the genetic algorithm that generates accompaniment data to
be played. And, then, the genetic algorithm returns the musical accompaniment
to be played.
We will use GA both for generating musical data to be played and to mod-
ify the tness function. We call the former the Evolutionary Accompaniment
Generation (EAG) layer and the latter the Fitness Adaptation (FA) layer.
For our GAs design, we will let openings for the design of reproduction,
crossing-over and mutation methods. So we will only concentrate on the design
of the tness functions.
Rhythmic Pattern Table (RPT). The RPT has information about the rhythm
and its relationship with the clusters. In this table, each column represents a
rhythmic unit. The rst line tells, in an arbitrary time unit, how long each
rhythmic unit lasts. The second line tells the beat strength of each rhythmic
unit. From the third line on, for each pitch in the cluster there is a correspond-
ing line telling if the pitch is to be played or not at a rhythmic unit. This
example:
rst line 1 1 1 1
second line 2 12 1 12
third line 1 0 1 0
is a music part in 2/4 metre: 4 rhythmic units with the same duration. The
rst rhythmic unit is to be played with a strong beat, the third a soft beat, the
second and the fourth have very soft beats. The rst pitch in the cluster is to
be played at the rst and the third rhythmic unit.
Transitional Table (TT). This table is lled with data saying how good is to
go from one pitch to another. Let us examine this example with, for simplicity,
only the pitchs of the traditional western major diatonic scale:
So, descending from C to the prior D is 0.2 good in the arbitrary scale used;
and ascending from C to the next D is 0.7 good.
Layered GA Evolving into Musical Accompaniment Generation 725
Each entry in the table can have, as well, a pointer to another TT, which
permits the tness function to evaluate sequences of pitchs. If the sequence C,
D, E is to be evaluated, and the entry tt(C, Ascending D) has a pointer to
another TT tt , so, the value of tt (D, Ascending E) is considered instead of
tt(D, Asceding E).
Regard again Fig. 1. The FA layer the Fitness Adaptation Algorithm in the
gure receives a musical stream, which is being externally performed. Basing
on this stream, it chooses a new individual representing a tness function to
govern the EAG layer.
The algorithm expects the musical stream to be similar to the musical stream
received up to the present moment, and keeps generating and selecting individ-
uals according to the musical stream received up to the present moment. The
system must be prepared, however: the environment can suddenly change and
there may not be enough time for the evolution to work on the avaliable data.
For the case of abrupt changes, the system maintains a database of diversi-
ed individuals. The algorithm for the FA layers tness function, which we call
meta-fitness function, is shown in the Alg. 1.
The distances between two individuals are euclidian distances between the
values in the data structures. 1 The meta-tness function recognizes the hap-
pening of an abrupt change when there is in the database any individual that
ts the musical stream better than the current one, and in this case, it returns
this better individual; otherwise it returns the individual most similar to the
current one.
1
It must there be enough care about not comparing data to pointers in TTs.
726 R. Santarosa, A. Moroni, and J. Manzolli
5 Discussions
6 Conclusion
The task of modeling tness functions for GAs in evolutionary composition is
hard and delegating the judgement of tness to human evaluation can be a dull
work. GA and IGA have to nd ways where human beings interacts with the sys-
tem, with your natural actions, like singing, dancing, or playing any instrument.
GA and IGA shall continue to be a good framework for musical composition, if
designers can incorporate elaborated criticisms into theirs systems.
The implementation of this work is in progress.
References
1. Dannenberg, R.: An on-line algorithm for real-time accompaniment. Proceedings
of the International Computer Music Conference (1984)
2. Raphael, C.: Orchestra in a box: A system for real-time musical accompaniment.
IJCAI (2003)
3. Bryson, J.: The reactive accompanist: Adaptation and behavior decomposition in a
music system. The Biology and Tech. of Intelligent Autonomous Agents (1994)
4. Papadopoulos, G., G., W.: AI methods for algorithmic composition: A survey, a
critical view and future prosp. Symposium on AI and Scientic Creativity (1999)
5. Wiggins, G., Papadopoulos, G., Phon-Amnuaisuk, S., Tuson, A.: Evolutionary
methods for musical composition. I. Journal of Comp. Anticipatory Systems (1999)
6. De Felice, F., Abbattista, F., F., S.: Genorchestra: An interactive evolutionary agent
for musical composition. Generative Art and Design Conference (2002)
7. Biles, J.: Autonomous GenJam: Eliminating the tness bottleneck by eliminating
tness. Genetic and Evolutionary Computation Conference (2001)
8. Phon-Amnuaisuk, S., Tuson, A., Wiggins, G.: Evolving musical harmonization.
ICANNGA (1999)
9. Moroni, A., Manzolli, J., Von Zuben, F.: Evolution and ARTbitration. Procedings
of the International Conference on Computer Graphics and Articial Inteligence,
Limoges, Frana (2000) 141146
A Preliminary Study on Handling Uncertainty
in Indicator-Based Multiobjective Optimization
1 Motivation
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 727739, 2006.
c Springer-Verlag Berlin Heidelberg 2006
728 M. Basseur and E. Zitzler
where R is an arbitrary reference set from M(X) and E() stands for the ex-
pected value.
Note that there is a fundamental dierence to other approaches, cf. [2]: we do
not assume that there is a true objective vector per solution which is blurred by
noise; instead, we consider the scenario that each solution is inherently associated
with a probability distribution over the objective space.
730 M. Basseur and E. Zitzler
since F (S) and F (R) are independent from each other. Here, pdfF () denotes
the probability density function associated with the random variable F ().
However, in practice the underlying probability density functions are in gen-
eral unknown, may vary for dierent solutions, and therefore can only be esti-
mated by drawing samples. Let us assume that S(x) M(Z) represents a nite
sample, i.e., a multiset of objective vectors, for solution x. Now, the expected
indicator value E(I(F (x), {A }) of F (x) with respect to a given set of objective
vectors {z1 , . . . , zq } can be estimated as follows
I({z}, {z1 , . . . , zq })
E(I(F (x), {z1 , . . . , zq })) = (4)
|S(x)|
zS(x)
where E stands for the estimated expected value and | | for the cardinality of
a set. For a multiset S of solutions with S = {x1 , x2 , . . . , xm }, the formula is
I({z1 , . . . , zm }, {z1 , . . . , zq })
E(I(F (S), {z1 , . . . , zq }) = ...
z1 S(x1 ) zm S(xm ) 1im |S(xi )|
(5)
and if one considers a reference set R of solutions with R = {x1 , . . . , xr }, then
the estimate amounts to
E(I(F (S), {z1 , . . . , zr }))
E(I(F (S), F (R)) = ... (6)
z
z 1ir |S(xi )|
1 S(x1 ) r S(xr )
and a reference set R with one element only - for reference sets of arbitrary size
the computation is still too expensive to be useful in practice. Later in Section 3
it will be discussed how this procedure can be integrated into an evolutionary
algorithm.
For a minimization problem, the -indicator I+ is dened as follows:
It gives the minimum -value by which B can be moved in the objective space
such that A is at least as good as B; a negative value implies that A is better
than B in the Pareto sense. If B consists of a single objective vector z , then
the formula reduces to
Now, to compute E(I+ (F (S), {z })) it is not necessary to consider all com-
binations of objective vectors to which the elements x S could be mapped
to. Instead, one can exploit the fact that always the minimum I+ ({x}, {z }))-
value determines the actual indicator value. By sorting the objective vectors
beforehand, it suces to consider the -values in increasing order.
In detail, this works as follows. We consider all pairs (xj , zk ), where xj S
and zk S(xj ), and sort them in increasing order regarding the indicator value
I+ ({zk }{z }). Suppose the resulting order is (xj1 , zk1 ), (xj2 , zk2 ), . . . , (xjl , zkl ).
Then, the estimate of the expected indicator value is
It can be done more eciently as soon as if all I+ values of the dierent
elements of one solution are smaller than , then the remaining I+ values greater
than , have a probability of 0. This scheme can be used, as detailed by Alg. 1.
3 Algorithm Design
IBCK
+ (z, z ) = Imin + cell crange (9)
The maximum error of this approach is equal to (Imax Imin )/c. The algo-
rithm complexity is in (nN 2 s(s + c)).
The second approach consists in approximating the minimum value computed
by EIV with an exponential function (Exp) applied on the dierent computed
indicators values, as realized in [1], without uncertainty:
F it(x1 ) = eI+ (z2 ,z1 )/ (10)
z1 S(x1 ) x2 S\{x1 } z2 S(x2 )
With one evaluation per solution, when kappa is close to 0, the correspond-
ing ranking tends to be exactly a lexicographic sorting comparison between all
computed indicator values. With several evaluations per solution, the proba-
bility of occurrence of each possible indicator is not considered here, but the
computational complexity of the algorithm 2 is reduced to (n(N s)2 ).
To evaluate the dierent scheme proposed, we also propose two alternative
algorithms. The rst approach envisaged consists in approximate EIV tness
assignment function is the Averaging method (Avg). First, the average value is
computed for each objective function, then the exact algorithm can be easily
applied with |S(x)| = 1, x S. In fact, in this case, we have the relation
E(I(F (S), F ({x })) = E(I(F (S), {z }), and:
n
1 1
F it(x ) = inf (zi , zi ) (12)
|S(x )| |S(x)|
z S(x ) i=1 xS\x zS(x)
734 M. Basseur and E. Zitzler
with inf (zi , zi ) equal to 0 (resp. 0.5, 1) if the ith objective value of z is smaller
(resp. equal, greater) than the ith objective value of z . The complexity of P DR
tness assignment algorithm is in (n (N s)2 ).
For all the dierent schemes proposed in this section, with use algorithm 2.
The tness assignment step is replaced by the corresponding method. Then,
for mating selection, we make a binary tournament between the solutions of S,
without the deleted solution xw . To achieve the tournament step, we compare
the solutions according to their tness value, computed for the selection (which
is not exactly the true tness value, since one solution has been removed from
the population).
4 Simulation Results
In the following, we investigate two questions concerning performance of the 5
dierent algorithms. First, we evaluate these algorithms for one selection step,
and compare the loss of quality obtained by each method. secondly, comparison
is done on entire runs on multiobjective tests functions.
We present only preliminary results. The uncertainty is dened with known
bound, distribution and central tendency. Moreover, performance evaluation are
realized by using the true objective vectors value of the output solutions. We
would like to make evaluation based expected values as in equation 3, but it is
really not feasible.
Box plot of the deviation on selection Box plot of the deviation on selection
0.06 0.06
0.05 0.05
0.04 0.04
Error
Error
0.03 0.03
0.02 0.02
0.01 0.01
0 0
.1 5 50 .1 5 50 .1 5 50 .1 5 50 1 5 50 1 5 50 1 5 50 1 5 50
BCK Exp PDR Avg BCK Exp PDR Avg
Fig. 1. Average selection error, with Fig. 2. Average selection error, with
dierent levels of uncertainty dierent number of evaluations
0.05
0.04
Error
0.03
0.02
0.01
0
1 2 5 10 1 2 5 10 1 2 5 10 1 2 5 10
BCK Exp PDR Avg
decision variables. The result is a variable noise, depending on the form of the
objective space around the envisaged solution.
The population size N is set to 50, with s = 5 evaluations for each solution.
Uniform repartition is applied for the two types of uncertainty (i.e. on decision
or on objective space). The maximum number of generations is set to 5000.
We perform 30 runs for each problem. The dierent methods are tested with
the same initial populations. The other parameters used, such as mutation and
recombination operators are those used in [1].
To evaluate the eectiveness of each method, we generate the true objec-
tive vector for each solution. Then, for each approximation A, we compute
E(I+ (R, A)) value, where R is the reference set, determined by merging all so-
lutions found during the experimentations, and keeping only the non-dominated
evaluations. The comparison of the whole set of runs is realized using the Mann-
Table 1. Comparison of the dierent selection methods for the I+ -indicator using
the Mann-Whitley statistical test: P value, with noise on objective vectors (Z) and on
decision vectors (X) - 2 objective problems. A cell 10a corresponds to a signicance
level in the interval [10a , 10(a1) ].
Whitley statistical test, applied on the sets of E(I+ (R, A)) values computed for
each method.
Table 1 and 2 represents the comparison of the dierent selection methods for
the E(I+ ) with the two dierent types of uncertainty: on objective vectors, and
on decision variables. To compare the sets of runs, we use the Mann-Whitley
statistical test, as described in [14]. The columns give the adjusted P value of
the corresponding pairwise test that accounts for multiple testings; it equals
to the lowest signicance level for which the null-hypothesis (the medians are
drawn from the same distribution) would still be rejected (with a signicance
level of 5%). A value under 5% shows that the method in the corresponding row
is signicantly better than the method in the corresponding column.
In many cases, the results are not signicant in the bi-objective case, since the
dierent approaches are similar, i.e. they use the same -indicator-based tness
assignment. But some conclusions could be extracted from table 1:
The exponential approximation approach Exp, give worst results in many
cases, excepted for KU R and COM ET instances.
BCK and EIV obtain similar results, which shows the eciency of BCK
to approximate EIV tness assignment method.
Uncertainty on objective vectors: in many cases, -indicator-based approaches
Avg, BCK and EIV perform signicantly better than Hughes selection mech-
anism P DR, especially for COM ET and ZDT 1 instances.
Uncertainty on decision variables: Avg results are signicantly worst than
EIV , BCK and P DR, in many cases (problems DT LZ2, ZDT 6 and KU R).
In table 2, we represent the results obtained for experimentations realized on
DT LZ2 test function, with dierent number of objectives. This table shows a
superior performance of EIV , BCK and Exp tness assignment methods when
Table 2. Evaluation with several number of objectives to optimize: DT LZ2 test func-
tion, using the Mann-Whitley statistical test: P value, with noise on objective vectors
(Z) and on decision vectors (X). A cell 10a corresponds to a signicance level in the
interval [10a , 10(a1) ].
5 Discussion
In this paper, we propose a method for handling uncertainty in indicator-based
evolutionary algorithm. Our approach tries to make no assumption about distri-
bution, bounds and general tendency of the uncertainty. We propose the algo-
rithm EIV , which computes the exact expected value of -indicator. In order to
apply this algorithm to environmental selection in EAs, we propose several al-
gorithms which approximate the results obtained by EIV , which select the best
possible solutions during environmental selection, according to the -indicator
performance metric. We have made several experimentations. First, we consider
the goal to minimize the loss in quality during environmental selection: BCK
give a good approximation of EIV selection, which is more time consuming.
Then, we made some experimentations on classical tests functions. Our pro-
posed method give interesting results with an increasing number of objective
functions. This are preliminary results. More experimentations are needed to
evaluate the dierent approaches on dierent problems and uncertainties. The
rst experimentation done for environmental selection process, with dierent
levels of uncertainties, number of objective functions and sample size, show that
the quality of the selected individuals, according to the I+ indicator, is improved
by the use of EIV or BCK tness assignment method. We can expect that en-
tire runs results will show the same tendency, but further experimentations has
to be done. We also need to dene new comparison test, which involve a set of
expected objective vectors values, without knowledge about the true objective
vector.
References
1. Zitzler, E., Kunzli, S.: Indicator-based selection in multiobjective search. In: Proc.
8th International Conference on Parallel Problem Solving from Nature (PPSN
VIII), Birmingham, UK (2004) 832842
2. Jin, Y., Branke, J.: Evolutionary optimization in uncertain environments - a survey.
IEEE Transactions on evolutionary computation 9 (2005) 303317
3. Arnold, D.V.: A comparison of evolution strategies with other direct search meth-
ods in the presence of noise. Computational Optimization and Applications 24
(2003) 135159
4. Horn, J., Nafpliotis, N.: Multiobjective optimization using the niched pareto ge-
netic algorithm. Technical report, University of Illinois, Urbana-Champaign, Ur-
bana, Illinois, USA (1993)
5. Hughes, E.: Evolutionary multi-objective ranking with uncertainty and noise. In:
EMO01: Proceedings of the First International Conference on Evolutionary Multi-
Criterion Optimization, London, UK, Springer-Verlag (2001) 329343
6. Teich, J.: Pareto-front exploration with uncertain objectives. In: Conference on
Evolutionary Multi-Criterion Optimization (EMO01). Volume 1993 of Lecture
Notes in Computer Science (LNCS). (2001) 314328
A Preliminary Study on Handling Uncertainty 739
1 Introduction
Recently, Sastry and Goldberg [1] presented an unbiased comparison between the
computational costs associated with crossover in a selectorecombinative genetic
algorithm (GA) to that of mutation. In that study, the mutation algorithm
exploits its linkage knowledge to greedily change one building block (BB) at a
time. In deterministic problems, the mutation algorithm outperforms the GA.
However, in the presence of constant exogenous Gaussian noise, the situation
ips as the selectorecombinative GA comes out on top.
One might wonder how these operators would fare in the presence of crosstalk,
or what is sometimes referred to as epistasis. Goldberg [2] conjectures that one
type of crosstalk, uctuating crosstalk, can induce similar eects to explicit
Gaussian noise, although still deterministic. He explains how a GA can converge
to the global optimum if the same approaches are applied as if external noise
was present: supply a larger population of individuals and allow more time for
population convergence. Furthermore, the needed population and convergence
time follow facetwise models derived for tness functions with additive Gaussian
noise when the crosstalk signal falls below a certain critical point. The purpose
of this paper is to construct a bounding test function that demonstrates this
eect when the uctuation noise is varied from the nonexistent to the very high,
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 740751, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Fluctuating Crosstalk as a Source of Deterministic Noise 741
scaling, and exogenous noise. Hence, a simple but eective way to optimize
deterministic functions in the presence of crosstalk is to apply known techniques
of handling deception, scaling, and exogenous noise. It becomes readily apparent
of the mapping to deception and scaling, and these cases are well addressed in the
literature. We explain the mapping between uctuating crosstalk and exogenous
noise with an example.
Consider the objective tness function f (x) = f1 (x1 x2 x3 x4 ) + f2 (x5 x6 x7 ) +
f3 (x8 x9 )+f4 (x5 x6 x7 x8 x9 ) . This function has three BBs dened by bit ranges: x1
to x4 , x5 to x7 , and x8 to x9 . Each bit range corresponds to a set of decision variables
that are related to one another. We assume that the GA is aware of these substruc-
tures and will seek to nd the values over these positions that maximize tness. In
a problem without epistasis (f4 ), each BB is independent of each other and the
global solution can be found by optimizing the sub-functions f1 , f2 , and f3 . Find-
ing the correct BBs means nding the bit values that optimize such sub-functions.
The concatenation of these BBs gives an optimal solution for f . In general, such
substructures are not known ahead of time but in many cases are believed to exist
and can be found in parallel through an array of GAs [2].
From this equation we can see how crosstalk or epistasis dened by f4 refers to
the nonlinear interaction among BBs. f4 may be positive (reinforcing crosstalk),
negative (punishing crosstalk), or a mix of positive and negative values. Goldberg
[2] has shown that reinforcing crosstalk maps to scaling, and punishing crosstalk
maps to scaling or deception, and that a competent GA that can handle scaling
and deception can handle reinforcing and punishing crosstalk.
Now, suppose that instead of always punishing or rewarding the individual
once the correct target bits are found, we add or subtract some positive weight
w to the tness function based on the parity over the target bits; +w for even
parity and w for odd parity. A natural question to consider is how drastic shifts
in tness because of a single bit change can be overcome during the course of
BB decision making. Fortunately, two things act in our favor. First, the initial
populations are assumed suciently randomized such that the target bits over
BBs participating in the crosstalk are random. Hence, the net eect on tness
due to uctuating crosstalk is zero since there are equal numbers of even-parity
individuals as there are odd-parity individuals over those target bits. The second
factor in our favor is the fact that ultimately, the GA will converge even with
a possible selection stall. Thus, towards the end of the run uctuation crosstalk
behaves as either reinforcing crosstalk or punishing crosstalk and a GA that can
handle those can eectively handle uctuating crosstalk.
The Walsh basis is signicant to GAs because it allows for bounding measures of
deceptiveness and for faster schema-average tness calculation. And in our case,
it allows us to represent our uctuating crosstalk example f4 as a partial, signed
sum of the Walsh coecients. We begin with some notation and denitions.
For our discussion, we take the intuitive approach introduced by Goldberg
[12]. Let x = xl xl1 . . . x2 x1 be the l-bit string representing the coding of the
decision variables for an arbitrary individual. We now introduce the auxiliary
string positions yi that are mapped to from the bitwise string positions xi for
i = 1, . . . , l by:
1, if xi = 0
yi =
1, if xi = 1 .
For an arbitrary tness function, we can imagine that some portion of the partial
signed sum of the Walsh coecients has direction - meaning that this smaller
sum is acyclic in the limit and represents what the GA seeks to solve. The
remaining portion uctuates with possibly irregular periods and shapes - but
not contributing to the overall direction of the function. This latter portion will
likely involve higher-order Walsh coecients since a higher order indicates that
more bits interact with another (since more bits of the index are set). Consider
the inclusion of the Walsh coecient w2l 1 for example. This means taking
the parity of the entire string. It provides no direction but merely acts as a
source of deterministic noise. Other coecients could be included as part of this
uctuating crosstalk but for this paper we assume only a full parity.
blocks and infer the superiority of one partition member over another. In this
section, we compare exogenous noise eects to those wrought by deterministic
noise and empirically validate model adherence. We present facetwise models of
population, convergence time, and function evaluations when exogenous noise
is present and examine at what point uctuating crosstalk diverges from these
models. However, we rst need to introduce the test problem that was common
to the three experiments.
GA might reliably choose the best partition member. Practical population sizing
bounds on selectorecombinative GAs using generation-wise modeling were given
by Goldberg, Deb, and Clark [15]. Those early bounds provided a guarantee of
good solution quality in a selectorecombinative GA with a suciently large pop-
ulation and have shown to well approximate solution quality in the more recent
Bayesian Optimization Algorithm (BOA) [16]. They were known to be some-
what conservative for the typical selectorecombinative GA, however, and tighter
bounds provided by Harik, Cantu-Paz, and Goldberg [17] are derived from the
gamblers ruin problem which considers the accumulated correctness of decid-
ing between partition members. This model, which also accounts for exogenous
noise, is given by:
k
n= 2 log() f2 + N
2 (1)
2d
where d is the signal or dierence in tness between the best and second best
individuals, k is the BB size, is the error tolerance, f2 is the tness function
2
variance, and N is the noise variance.
Without noise, this reduces to
k
n0 = 2 log()f . (2)
2d
2
Dividing 1 by 2 and letting N = w22l 1 and f2 = mBB
2 2
, where BB is the
5-bit trap variance, reveals an important population size ratio:
n w22l 1
nr = = 1+ 2 . (3)
n0 mBB
The above model was specically derived with exogenous noise in mind but
what eect on required population size does uctuating crosstalk have? When the
crosstalk signal is low, we would expect the GA to rst solve BBs for the tness
function, since solving these will yield higher marginal contributions than those
BBs of the crosstalk. The crosstalk merely interferes with the decision making,
acting as a source of deterministic noise. As the crosstalk signal increases, the
problem shifts to a parity-dominated one whereby individuals with even parity
are rst discovered and then varied to nd the secondary benets of solving the
trap function. In this sense the directional function is actually perturbing the
crosstalk. Once on the parity-dominated side, the required population will level
o since the population is large enough for the GA to nd the optimal or near-
optimal solution and supplying a larger crosstalk signal only creates an eective
reduction in the marginal value of solving BBs from the new perturbation source.
Figure 1 illustrates these principles clearly. Before the parity-dominated point
but for a given crosstalk signal, required population size grows as problem size
m grows. More building blocks induce higher collateral noise since more BBs
incur larger numbers of schemata to consider, and the resulting changes to a
single BB due to crossover are obfuscated by the simultaneous variations among
746 K. Sastry et al.
1.9
1.8
r
1.7
Populationsize ratio, n
1.6 m=4
m=6
1.5 m=8
m = 10
1.4 m = 12
m = 14
1.3 m = 16
m = 18
1.2 m = 20
m = 30
1.1 m = 40
m = 50
1 Theory
2 4 6 8 10 12
(1 + w2j /m2BB)0.5
Fig. 1. Population size requirements for optimal convergence when uctuating crosstalk
is present (wj = w2l 1 ) and problem size m varies from 4 to 50 BBs. Initially, the
eects of the crosstalk follow the model (rising bolded line) of exogenous noise closely
but then level o when the problem becomes parity-dominated. The at bolded line is
the average required population for all problem sizes tested.
noise. Note that d has entirely dropped out of the equation. From Fig. 2, we see
that d actually has a small inuence on the population enlargement factor due
to noise and recognize that our model does not capture this small dierence.
m=4 m = 10
r
r
Populationsize ratio, n
Populationsize ratio, n
1.8 1.8
1.6 1.6
d = 0.01 d = 0.01
1.4 d = 0.05 1.4 d = 0.05
d = 0.10 d = 0.10
1.2 d = 0.15 1.2 d = 0.15
d = 0.20 d = 0.20
1 d = 0.40 1 d = 0.40
2 4 6 8 10 12 2 4 6 8 10 12
(1 + w2/m2 )0.5 (1 + w2/m2 )0.5
j BB j BB
m = 20 m = 50
r
r
Populationsize ratio, n
Populationsize ratio, n
1.8 1.8
1.6 1.6
d = 0.01 d = 0.01
1.4 d = 0.05 1.4 d = 0.05
d = 0.10 d = 0.10
1.2 d = 0.15 1.2 d = 0.15
d = 0.20 d = 0.20
1 d = 0.40 1 d = 0.40
2 4 2
6 8
2 0.5
10 12 2 4 2
6 2
8 0.5 10 12
(1 + wj /mBB) (1 + wj /mBB)
Fig. 2. Population size requirements for optimal convergence when uctuating crosstalk
is present (wj = w2l 1 )
m=4 m = 10
c,r
c,r
Convergencetime ratio, t
Convergencetime ratio, t
1.8 1.8
1.6 1.6
d = 0.01 d = 0.01
1.4 d = 0.05 1.4 d = 0.05
d = 0.10 d = 0.10
1.2 d = 0.15 1.2 d = 0.15
d = 0.20 d = 0.20
1 d = 0.40 1 d = 0.40
2 4 6 8 10 12 2 4 6 8 10 12
(1 + w2/m2 )0.5 (1 + w2/m2 )0.5
j BB j BB
m = 20 m = 50
c,r
c,r
Convergencetime ratio, t
Convergencetime ratio, t
1.8 1.8
1.6 1.6
d = 0.01 d = 0.01
1.4 d = 0.05 1.4 d = 0.05
d = 0.10 d = 0.10
1.2 d = 0.15 1.2 d = 0.15
d = 0.20 d = 0.20
d = 0.40 1 d = 0.40
1
2 4 6 8 10 12 2 4 2
6 8
2 0.5
10 12
2 2 0.5
(1 + wj /mBB) (1 + w /m )
j BB
where I is the selection intensity [18, 24, 27]. As we saw with population size, the
ratio of the convergence time under noise to that without noise does not depend
on the signal. We start by considering the convergence time needed when noise
is absent:
l
tc0 = . (7)
2I
By casting equation 6 in terms of tc 0 , and then dividing both sides by tc 0 , we
obtain the convergence time ratio:
tc w22l 1
tc,r = = 1+ 2 . (8)
tc0 mBB
Note again that the convergence time doesnt depend on the trap signal d. We
observe from Fig. 3 that the predicted plots follow the model well for varying
signals and building blocks. In comparing the predicted plots, it is the sloping
solid line that represents the prediction. For m = 4, we see that convergence
time follows precisely what is predicted but reaches the critical threshold earlier
than the average of the time-convergence runs. This can be attributed to the
fact that for small m, fewer BBs means fewer simultaneous variations among
other crossover-induced BB changes and is hence more sensitive to larger Walsh
coecients. Sensitivity reduces for larger m, and we see that the critical threshold
levels out just under a tc,r of 2.
Fluctuating Crosstalk as a Source of Deterministic Noise 749
fe,r
m=4 m = 10
# functionevaluations ratio, n
# functionevaluations ratio, n
3 3
2.5 2.5
d = 0.01 d = 0.01
2 d = 0.05 2 d = 0.05
d = 0.10 d = 0.10
d = 0.15 d = 0.15
1.5 1.5
d = 0.20 d = 0.20
d = 0.40 d = 0.40
1 1
5 10 15 20 25 5 10 15 20 25
1 + w2/m2 1 + w2/m2
j BB j BB
fe,r
fe,r
m = 20 m = 50
# functionevaluations ratio, n
# functionevaluations ratio, n
3 3
2.5 2.5
d = 0.01 d = 0.01
2 d = 0.05 2 d = 0.05
d = 0.10 d = 0.10
d = 0.15 d = 0.15
1.5 1.5
d = 0.20 d = 0.20
d = 0.40 d = 0.40
1 1
5 10 2 15 2 20 25 5 10 2 15 2 20 25
1 + wj /mBB 1 + w /m
j BB
5 Future Work
Future work includes modeling uctuating crosstalk using smaller order Walsh
coecients and determining a more precise transition point between crosstalk-
as-noise and crosstalk-as-signal. Another useful direction is to consider various
forms of epistasis such as if only a portion of the entire chromosome is used to de-
termine the parity of the individual. Such epistasis may be uniformly distributed
as parity bits within each BB, conned to entire BBs, or be a mixture of both.
Of course, bits may also be involved in multiple parity evaluations. We seek to
consider these eects on GA scalability and under what conditions uctuating
crosstalk behaves like exogenous noise.
750 K. Sastry et al.
References
1. Sastry, K., Goldberg, D.E.: Lets get Ready to Rumble: Crossover Versus Mutation
Head to Head. Proceedings of the 2004 Genetic and Evolutionary Computation
Conference 2 (2004) 126137 Also IlliGAL Report No. 2004005.
2. Goldberg, D.E.: Design of Innovation: Lessons from and for Competent Genetic
Algorithms. Kluwer Acadamic Publishers, Boston, MA (2002)
3. Kumar, V.: Tackling Epistasis: A Survey of Measures and Techniques. Assignment
from an advanced GEC course taught at UIUC by D. E. Goldberg (2002)
4. Davidor, Y.: Epistasis Variance: A Viewpoint on GA-hardness. foga91 (1991)
2335
5. Naudts, B., Kallel, L.: Some Facts about so-called GA-hardness Measures. Tech.
Rep. No. 379, Ecole Polytechnique, CMAP, France (1998)
6. Heckendorn, R.B., Whitley, D.: Predicting Epistasis from Mathematical Models.
Evolutionary Computation 7(1) (1999) 69101
7. Muhlenbein, H., Mahnig, T., Rodriguez, A.O.: Schemata, Distributions and Graph-
ical Models in Evolutionary Optimization. Journal of Heuristics 5 (1999) 215247
8. Pelikan, M., Goldberg, D.E., Lobo, F.G.: A Survey of Optimization by Building
and Using Probabilistic Models. Comput. Optim. Appl. 21(1) (2002) 520
9. Lauritzen, S.L.: Graphical Models. Oxford University Press (1998)
10. Beasley, D., Bull, D.R., Martin, R.R.: Reducing Epistasis in Combinatorial Prob-
lems by Expansive Coding. In: ICGA. (1993) 400407
11. Barbulescu, L., Watson, J.P., Whitley, L.D.: Dynamic Representations and Es-
caping Local Optima: Improving Genetic Algorithms and Local Search. In:
AAAI/IAAI. (2000) 879884
12. Goldberg, D.E.: Genetic Algorithms and Walsh Functions: Part I, a Gentle Intro-
duction. Complex Systems 3(2) (1989) 129152 (Also TCGA Report 88006).
13. Bethke, A.D.: Genetic Algorithms as Function Optimizers. PhD thesis, The Uni-
versity of Michigan (1981)
14. Sastry, K.: Evaluation-Relaxation Schemes for Genetic and Evolutionary Algo-
rithms. Masters thesis, University of Illinois at Urbana-Champaign, General En-
gineering Department, Urbana, IL (2001) (Also IlliGAL Report No. 2002004).
Fluctuating Crosstalk as a Source of Deterministic Noise 751
15. Goldberg, D.E., Deb, K., Clark, J.H.: Genetic Algorithms, Noise, and the Sizing of
Populations. Complex Systems 6 (1992) 333362 (Also IlliGAL Report No. 91010).
16. Pelikan, M., Goldberg, D.E., Cantu-Paz, E.: Linkage Learning, Estimation Distri-
bution, and Bayesian Networks. Evolutionary Computation 8(3) (2000) 314341
(Also IlliGAL Report No. 98013).
17. Harik, G., Cantu-Paz, E., Goldberg, D.E., Miller, B.L.: The Gamblers Ruin Prob-
lem, Genetic Algorithms, and the Sizing of Populations. Evolutionary Computation
7(3) (1999) 231253 (Also IlliGAL Report No. 96004).
18. Bulmer, M.G.: The Mathematical Theory of Quantitative Genetics. Oxford Uni-
versity Press, Oxford (1985)
19. Falconer, D.S.: Introduction to Quantitative Genetics. Third edn. John Wiley and
Sons, New York, NY, USA; London, UK; Sydney, Australia (1989)
20. Muhlenbein, H., Schlierkamp-Voosen, D.: Predictive Models for the Breeder Ge-
netic Algorithm: I. Continous Parameter Optimization. Evolutionary Computation
1(1) (1993) 2549
21. Muhlenbein, H., Schlierkamp-Voosen, D.: The Science of Breeding and its Appli-
cation to the Breeder Genetic Algorithm (BGA). Evolutionary Computation 1(4)
(1994) 335360
22. Thierens, D., Goldberg, D.E.: Convergence Models of Genetic Algorithm Selection
Schemes. Parallel Problem Solving from Nature 3 (1994) 116121
23. Thierens, D., Goldberg, D.E.: Elitist Recombination: An Integrated Selection Re-
combination GA. Proceedings of the First IEEE Conference on Evolutionary Com-
putation (1994) 508512
24. Back, T.: Generalized Convergence Models for Tournamentand (, )
Selection. Proceedings of the Sixth International Conference on Genetic Algorithms
(1995) 28
25. Miller, B.L., Goldberg, D.E.: Genetic Algorithms, Selection Schemes, and the
Varying Eects of Noise. Evolutionary Computation 4(2) (1996) 113131 (Also
IlliGAL Report No. 95009).
26. Voigt, H.M., Muhlenbein, H., Schlierkamp-Voosen, D.: The Response to Selec-
tion Equation for Skew Fitness Distributions. Proceedings of the International
Conference on Evolutionary Computation (1996) 820825
27. Blickle, T., Thiele, L.: A Mathematical Analysis of Tournament Selection. Pro-
ceedings of the Sixth International Conference on Genetic Algorithms (1995) 916
Integrating Techniques from Statistical Ranking
into Evolutionary Algorithms
1 Introduction
In many practical optimization problems, a solutions tness is noisy, i.e. it can
not be determined accurately and has to be considered a random variable. The
sources for noise can be manifold, including optimization based on randomized
simulations, measurement errors, stochastic sampling, and interaction with users.
Generally, noise is considered a major challenge for optimization, as it makes
it dicult to decide which of two solutions is better, and thus to drive the search
reliably towards the more promising areas of the search space. The eect of noise
on the performance of evolutionary algorithms (EAs) has been investigated in
several papers, and EAs are generally considered to be quite robust with respect
to noise, see e.g. [1, 3, 14]. A recent survey on this topic is [16].
For most noisy optimization problems, the uncertainty in tness evaluation
can be reduced by sampling a solutions tness several times and using the aver-
age as estimate for the true mean tness. Sampling n times scales the standard
deviation of the estimator of the mean by a factor of 1/ n, but increases the
computational time by a factor of n. This is a critical tradeo trade-o: either
one can use relatively exact estimates but only run the algorithm for a small
number of iterations (because a single tness estimate requires many evalua-
tions), or one can let the algorithm work with relatively crude tness estimates,
but allow for more iterations (as each estimate requires less eort).
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 752763, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Integrating Techniques from Statistical Ranking 753
This paper presents a new way to integrate statistical ranking and selection
techniques into EAs in order to improve their performance in noisy environments.
Ranking and selection addresses the question of identifying the individual with
the true best mean out of a given (nite) set of individuals by sampling the
individuals tnesses. Thereby, it is attempted to achieve a desired selection
quality (e.g. probability of correct selection) with a minimal number of samples.
We propose a framework that tightly integrates an ecient statistical selection
procedure with an EA, allowing it to focus on those pairwise comparisons that
are crucial for the EAs operation.
Section 2 briey surveys related work. Section 3 introduces a statistical selec-
tion technique, OCBA, that is used in the EA. Section 4 examines the informa-
tion requirement of the EA. Section 5 explains how OCBA is adapted to generate
that information eciently. Section 6 gives preliminary empirical results.
2 Related Work
There have been some earlier attempts to integrate ranking and selection tech-
niques into EAs. Stagge [17] considered a (, ) or (+ ) evolution strategy and
suggested that the sample size should be based on an individuals probability
to be among the best ones that will be selected. Hedlund and Mollaghasemi
[15] use an indierence-zone selection procedure to select the best m out of k
individuals within a genetic algorithm.
For tournament selection, Branke et al. [8, 9] and Cantu-Paz [11] use sequential
sampling techniques to reduce the number of samples to the minimum required to
discriminate between individuals in a tournament. Adaptive sampling strategies
have also been examined for situations where the noise strength varies over space
[13]. Boesel [4] argues that for linear ranking selection, it is sucient to group
individuals of similar quality into one rank, and a corresponding mechanism is
proposed.
To clean up after optimization (to identify the best, with high probability, of
all visited solutions), Boesel et al. [5] uses a ranking and selection procedure after
the EA has nished. Recently, Buchholz and Thummler [10] used a statistical
selection technique for selection in a + strategy as well as to maintain a
pool of promising candidates throughout the run from which at the end the nal
solution is selected.
None of the above works provides the general framework and tight integration
of selection algorithms and EAs that is suggested here.
A comprehensive overview and extensive comparison of dierent selection
procedures can be found in [7], which concludes that OCBA together with an
appropriate stopping rule is among the best performing statistical selection pro-
cedures. Our approach is based on OCBA which shall be described next.
improved with exible stopping rules in [7], and was shown to be one of the
most ecient selection procedures. We rst dene the sampling assumptions
that were used to derive the procedure, then describe the procedure itself.
Let Xij be a random variable whose realization xij is the output of the j-th
evaluation of individual i, for i = 1, . . . , k and j = 1, 2, . . .. Let wi and i2 be
the unknown mean and variance of the evaluated individual i, and let w[1]
w[2] . . . w[k] be the ordered means. In practice, the ordering [] is unknown,
and the best individual, individual [k], is to be identied with tness sampling.
OCBA assumes that simulation output is independent and normally distributed,
conditional on wi and i2 , for i = 1, . . . , k. Although this normality assumption
is not always valid, it is often possible to batch a number of evaluations so that
normality is approximately satised.
nLet ni be the number of replications for individual ni i run so far. Let xi =
i
x
j=1 ij /n i be the sample mean and i
2
= (x
j=1 ij xi )2 /(ni 1) be the
sample variance. Let x(1) x(2) . . . x(k) be the ordering of the sample
means based on all replications seen so far. Equality occurs with probability 0 in
contexts of interest here. The quantities ni , xi , i2 and (i) may change as more
replications are observed.
The standard selection problem is to identify the best of the k individuals,
where best means the largest expected tness. From a Bayesian perspective
the means Wi are St xi , ni /i2 , ni 1 distributed (assuming a non-informative
prior), where St (m, , ) is a Student-distribution with mean m, precision
and degrees of freedom. Upper case is used for the random variable, and lower
case is used for the (as yet unknown) realization. Given the data E seen so
far, the probability PCSBayes that the individual with the best observed mean,
individual (k), is really the best individual, [k], can be approximated using the
Slepian inequality and Welch approximations:
def
PCSBayes = Pr W(k) max Wi | E
i=(k)
Pr W(k) Wi | E
i
=(k)
def
(k)i (d(k)i /s(k)i ) = PCSSlep ,
i
=(k)
The OCBA variant working with PGSSlep, as quality goal is denoted OCBA .
OCBA attempts to optimize PGSSlep, by greedily and iteratively allo-
cating additional evaluations to individuals where it promises the largest im-
provement, assuming that the means do not change and the standard error is
scaled back appropriately. More specically, in a rst stage of sampling, OCBA
evaluates the tness function n0 times per individual. In each subsequent se-
quential stage, additional evaluations are given to one individual, and none
to the others. The ith individual is selected in a given stage if it maximizes
PGSSlep, when the distribution for the ith unknown mean is changed from
St xi , ni /i2 , ni 1 to
Wi St xi , (ni + )/i2 , ni 1 +
Looking at an EAs typical iteration (Figure 1), there are two steps which are
aected by noise: In the selection step, better t individuals are assigned a higher
probability to be selected as mating partner. In the replacement step1 , the set
of new and old individuals is reduced to the usual population size by removing
some individuals depending on age and tness.
Selection Parents
Reproduction
t+1
Population
t
Offspring
Replacement
For deterministic tness functions, the EA has access to the complete order-
ing of the combined set of individuals in the old population and the generated
ospring population. That is, for any pair of individuals, it can determine with
certainty which one is better. However, only parts of this information is actually
1
This is often also called environmental selection.
756 C. Schmidt, J. Branke, and S.E. Chick
In the EA literature, dierent methods have been proposed to select, from the
population, the parents that generate a child. Among the most popular are
linear ranking selection and tournament selection, which generate the same
expected probability for an individual to be selected. For standard tournament
selection, two individuals i and j are chosen randomly from the population, and
the better one is selected. Thus, for a single selection step, order information on
only these two individuals is required:
Evolution strategies usually use random mating selection which does not
require any tness information. Another popular selection strategy is fitness
proportional selection. Since it is based on relative tnesses rather than ranks,
our method can not be applied here. However, tness proportional selection is
known to have some scaling and convergence issues, and generally rank-based
selection methods are preferred anyway.
achieved by selecting ranks instead of individuals in the selection step. For ex-
ample, for a tournament selection, instead of selecting the better of individuals
i and j, one can select the better of individuals (p) and (q), where p and q are
ranks in the surviving population.
For example, consider a simple steady-state EA with population size = 4,
and one ospring generated. The two individuals for mating are selected by
two tournaments. Then, we need to insure that the worst individual is re-
moved by replacement, i.e. C {(2)P , (1)P , (3)P , (1)P , (4)P , (1)P }. Fur-
thermore, we need two random pairs of individuals for tournament, e.g. C
C {(3), (2), {(5), (3)}. Overall, this means only information about the rel-
ative order of + 1 pairs of individuals needs to be correct, as opposed to the
( + 1)/2 = 10 pairs for a complete ordering. The savings increase with in-
creasing population size. For e.g. = 20, only 21 pairs of individuals need to be
compared, as opposed to 210 for a complete ordering.
Additional savings in an EA setting stem from the fact that individuals that
survived from the previous generation have already been re-evaluated. OCBA
can make full use of this information. Furthermore, because it is likely that
individuals surviving to the next iteration have been sampled more often than
those that are killed o, there is an additional advantage of using OCBA
compared to using a xed sample size [2].
Here, the OCBA uses PGGSlep, as goal function just as it does with PGSSlep, .
It automatically and eciently allocates samples to individuals until a desired goal
PGGSlep, > 1 is obtained.
Integrating Techniques from Statistical Ranking 759
Procedure OCBAEA
( )
1. Evaluate each new individual n0 times (old individuals have already been
sampled at least n0 times). Estimate the ranks by ordering the individuals
based on the observed mean values.
2. Determine C: initialize C and add comparisons from the operators as
described in Sections 4.14.2.
3. WHILE the observed results are not suciently sure (PGGSlep, < 1 )
DO
(a) Allocate new evaluations to the individuals according to the OCBA -
allocation rule.
(b) If ranks have changed from the previous iteration of the ranking proce-
dure, update C: initialize C and add comparisons from the operators
as described in Sections 4.14.2.
OCBAEA is called in every iteration of the EA after the ospring has been
generated and before replacement. Then, the EA proceeds simply using the
ordering given by the sample means.
The number of samples taken by OCBAEA depends on the problem cong-
uration and the settings of and . It may be useful to vary over time,
as higher accuracy may be needed towards the early and late phases of the
algorithm (cf. [6]).
6 Empirical Evaluation
We empirically compare dierent sampling mechanisms based on their perfor-
mance on a single iteration. The proposed integration of ranking and selection
with EAs needs a more exhaustive evaluation in future work.
We stochastically generated k = 10 individuals with means distributed ac-
cording to the negative of an exponential distribution with mean 1 and vari-
ances distributed according to an inverted gamma distribution with = 100
and = 99 (Figure 2). Such a distribution with more good than bad individuals
seems common in an EA run, at least towards the end of the run, when the al-
gorithm produces many solutions close to the optimum, and a few outliers. The
results below are averaged over 100,000 such randomly sampled populations.
We compare the frequentist probability of a good generation PGGiz, de-
pending on the expected number of evaluations used by dierent procedures. To
calculate PGGiz, , we run the sampling mechanism and look at the resulting
order according to sample means. If all decisions required by the scenario (i.e.,
those dened by C) have been identied correctly taking into account the in-
dierence zone, the run is counted as successful. Otherwise, it is not successful.
PGGiz, is the percentage of correct runs. The parameters and not only
are determinants of PGGiz, , but also of the expected total number of samples,
E[N ], for a given numerical experiment.
760 C. Schmidt, J. Branke, and S.E. Chick
1 4.5
0.9 4
0.8 3.5
0.7
3
0.6
2.5
p ( )
p (w)
2
0.5
2
0.4
1.5
0.3
0.2 1
0.1 0.5
0 0
-5 -4 -3 -2 -1 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4
w 2
(a) (b)
Fig. 2. Empirical distribution of means (a) and variances (b) for the empirical tests
For all tests, an indierence zone of = 0.2 is used, i.e. the ordering of a
pair of individuals is accepted as correct if the higher ranked individual is not
1
OCBA0.2 Rank
OCBA0.2 (5,10)-ES
OCBA0.2 Select
OCBA0.2 steady-state EA
0.1
1-PGGIZ,0.2
0.01
0.001
0 100 200 300 400 500 600 700 800
E [N]
more than 0.2 worse than the lower ranked individual. We use the stopping rule
PGGSlep, > 1 , where is varied to generate lines in Figure 3.
A complete ranking of the individuals is the most challenging task and re-
quires the highest number of samples to reduce the error 1 PGGiz, . The
curves for steady-state EA and (5,10) ES are signicantly below the curve for
the complete ranking, which shows that an EA indeed requires only partial infor-
mation, and that a lot of samples can be saved by generating only the required
information. Interestingly, the steady-state EA operation even requires less sam-
ples than identifying only the best individual (OCBA0.2 Select). This is due to
the fact that we generate the means according to a negative exponential distribu-
tion, i.e. there are several good but few bad individuals, and thus it is relatively
easier to identify the worst individual and the better of two random pairs for
the steady-state EA than it is to identify the best individual.
Figure 4 compares our new OCBA-based EA with standard EAs using the
same number of samples for each individual. The OCBA-based sampling alloca-
tion schemes are much more ecient than the corresponding Equal allocation
variants, which shows that integrating a statistical ranking and selection proce-
dure is benecial.
Equal steady-state EA
Equal (5,10)-ES
OCBA0.2 steady-state EA
OCBA0.2 (5,10)-ES
0.1
1-PGGIZ,0.2
0.01
0.001
0 100 200 300 400 500 600 700 800
E [N]
Fig. 4. Comparison of the new OCBA-based EA (bold lines) and the standard EA with
Equal sampling (thin lines)
7 Conclusion
Optimization in noisy environments is challenging, because the noise makes it
dicult to decide which of two solutions is better, which is a prerequisite for every
optimization algorithm. While noise can usually be reduced by averaging over
multiple samples, this is a costly process. In this paper, we have proposed a new
adaptive sample allocation mechanism that attempts to minimize the number
of samples required to warrant a proper functioning of an EA. The approach is
based on two ideas:
1. Restriction of the focus on those pairwise comparisons that are actually
used by the EA. As the empirical results have shown, these comparisons
may require less samples than even only identifying the best individual.
2. The use of OCBA, a sample allocation procedure from statistical ranking
and selection. This allowed to distribute the additional evaluations to those
individuals where they promise the highest benet, and to stop sampling
when there is sucient evidence for correct selection.
More work is needed to identify the best way to take advantage of this proposal.
As a next step, we will show the benet of the proposed method not only on a
hypothetical generation, but over a complete run of the EA. Then, the method
will be extended to allow an ecient identication of the best individual encoun-
tered during the run. Finally, guidelines for setting the parameters and as
iterations proceed in order to obtain a better overall performance are needed.
References
1. D. V. Arnold and H.-G. Beyer. A comparison of evolution strategies with other
direct search methods in the presence of noise. Computational Optimization and
Applications, 24:135159, 2003.
2. T. Bartz-Beielstein, D. Blum, and J. Branke. Particle swarm optimization and
sequential sampling in noisy environments. In Metaheuristics International Con-
ference, 2005.
3. H.-G. Beyer. Toward a theory of evolution strategies: Some asymptotical results
from the (1 +, )-theory. Evolutionary Computation, 1(2):165188, 1993.
4. J. Boesel. Search and Selection for Large-Scale Stochastic Optimization. PhD
thesis, Northwestern University, Evanston, Illinois, USA, 1999.
5. J. Boesel, B. L. Nelson, and S. H. Kim. Usint ranking and selection to clean up
after simulation optimization. Operations Research, 51:814825, 2003.
Integrating Techniques from Statistical Ranking 763
1 Introduction
Many real-world problems are dynamic in nature. The interest in applying evo-
lutionary algorithms (EAs) in dynamic environments has been increasing over
the past years, which is reected in the increasing number of papers on the topic.
For an in-depth overview on the topic, see e.g. [2, 10, 12, 17].
Most of the literature attempts to modify the algorithm to allow a better
tracking of the optimum over time, e.g. by increasing diversity after a change,
maintaining diversity throughout the run, or incorporating a memory. In this
paper, we focus on the representations inuence on an EAs performance in
dynamic environments. Instead of searching the solution space directly, usually
EAs search in a transformed space dened by the genetic encoding. This mapping
between solution space and genetic search space is generally called representa-
tion, or genotype-phenotype mapping. The representation together with the
genetic operators and the tness function dene the tness landscape, and it is
generally agreed upon that a proper choice of representation and operators is
crucial for the success of an EA, see e.g. [15, 16].
Depending on the representation, the tness landscape can change from being
unimodal to being highly multimodal and complex, and thus the representation
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 764775, 2006.
c Springer-Verlag Berlin Heidelberg 2006
The Role of Representations in Dynamic Knapsack Problems 765
strongly inuences the EAs ability to approach the optimum. In a dynamic en-
vironment, in addition to the (static) characteristics of the tness landscape, the
representation inuences the characteristics of the tness landscape dynamics,
as has been recently demonstrated in [3]. Consequently, depending on the repre-
sentation, the tracking of the optimum over time may be more or less dicult.
This paper examines the performance of three dierent genetic representa-
tions for the dynamic multi-dimensional knapsack problem (dMKP) [3]. The
MKP is a well studied problem, and dierent representations have been pro-
posed and compared e.g. in [6, 9, 14, 15]. For our study, the binary representa-
tion with a penalty for constraint handling is selected as an example of a direct
representation. As indirect representations, we consider a permutation represen-
tation and a weight-coding. In the latter, the items prots are modied and a
simple deterministic heuristic is used to construct the solution. Intuitively, this
last representation seems particularly promising for dynamic environments, as
it naturally incorporates heuristic knowledge that would immediately improve a
solutions phenotype after a change of the problem instance.
The paper is structured as follows: Section 2 introduces the dynamic MKP
problem and briey explains the dierent representations used in this study. The
experimental results are reported and analyzed in Section 3. The paper concludes
in Section 4 with a summary and some ideas for future work.
where pmax is the largest prot value calculated as in Eq. 5, rmin is the mini-
mum resource consumption calculated as in Eq. 6 and CV (x, i) is the maximum
constraint violation for the ith constraint ci calculated as in Eq. 7. It should be
noted that rmin = 0 must be ensured. As genetic operators bit ip mutation and
uniform crossover are used.
without violating at least one constraint, which is a necessary condition for op-
timality. Thus, the decoder generates solutions that are of signicantly higher
quality than randomly generated solutions. In a dynamic environment, solu-
tions are immediately repaired after a change such that they are again at the
boundary of feasibility in the new environment.
In [9], a good setup for the permutation representation is recommended, in-
cluding uniform order based crossover (UOBX) and insert mutation as variation
operators. In insert mutation, a new position for an element is selected randomly.
The mutated element is inserted into its new position and the other elements
are re-positioned accordingly. In UOBX, some positions are transfered directly
to the ospring from the rst parent with probability p1 = 0.45. Then, starting
from the rst position, undetermined positions are lled with missing items in
the order of the second parent.
where ai is the surrogate multiplier for the ith constraint, and rij is the resource
coecient.
Surrogate multipliers are determined by solving the relaxed MKP (i.e., vari-
ables xi can take any value [0, 1]) by linear programming, and using the values
of the dual variables as surrogate multipliers. Then, to obtain a heuristic solu-
tion to the MKP, the profit/pseudo-resource consumption ratios denoted as uj
are calculated as given in Eq. 9.
pj
u j = m (9)
i=1 ai rij
The items are then sorted in decreasing order based on their uj values, and this
order is used to construct solutions just as for the permutation representation, i.e.
items are considered one at a time, and if none of the constraints are violated,
added to the solution. To keep computation costs low, in [14] the surrogate
multiplier values ai are determined only once for the original problem at the
beginning. As a result, the decoding step starts with the computation of the uj
768 J. Branke, M. Orbay, and S. Uyar
values based on the biased prots. Note that in a dynamic environment, the WC
representation requires to re-compute the pseudo-resource consumption values
ai after every change of the environment.
In [14], several biasing techniques are discussed and compared. We initialize
the biases according to wj = 2R , where R is a uniformly generated variable in the
range [1, 1] This leads to a distribution with many small and few larger values.
For mutation, we deviate from the re-initialization of biases used in [14] and
instead use Gaussian mutation with = 1. To generate the modied prots, the
original prots are simply multiplied with the biases, i.e. pj = pj wj . Uniform
crossover is used as second genetic operator.
Since the permutation representation and the WC representation share similar
construction mechanisms, they both benet from the resulting heuristic bias.
However, by calculating the pseudo-resource consumption values, the inuence
of heuristic knowledge for WC is even larger.
In dynamic environments, the WC representation appears to be particularly
advantageous, for two reasons:
In our study, we use a dynamic version of the MKP as proposed in [3] and
described below. Basis is the rst instance given in the le mknapcb4.txt which
can be downloaded from [1]. It has 100 items, 10 knapsacks and a tightness ratio
of 0.25. For every change, the prots, resource consumptions and the constraints
are multiplied by a normally distributed random variable as follows:
pj pj (1 + N (0, p ))
rij rij (1 + N (0, r )) (10)
ci ci (1 + N (0, c ))
Unless specied otherwise, the standard deviation of the normally distributed
random variable used for the changes has been set to p = r = c = 0.05 which
requires on average 11 out of the 100 possible items to be added or removed
from one optimal solution to the next. Each prot pj , resource consumption rij
and constraint ci is restricted to an interval as determined in Eq. 11.
lbp pj pj ubp pj
lbr rij rij ubp rij (11)
lbc ci ci ubp ci
The Role of Representations in Dynamic Knapsack Problems 769
where lbp = lbr = lbc = 0.8 and ubp = ubr = ubc = 1.2. If any of the changes
causes any of the lower or upper bounds to be exceeded, the value is bounced back
from the bounds and set to a corresponding value within the allowed boundaries.
3 Empirical Results
For this study, we used a more or less standard steady-state EA with a popula-
tion size of 100, binary tournament selection, crossover probability of 1.0, and
mutation probability of 0.01 per gene. The genetic operators crossover and mu-
tation depend on the representation and have been implemented as described
above. The new child replaces the current worst individual in the population if
its tness is better than the worst. The EA uses phenotypic duplicate avoidance,
i.e. a child is re-generated if a phenotypically identical individual already exists
in the population. This feature seems important in particular for indirect repre-
sentations with high redundancy, i.e. where many genotypes are decoded to the
same phenotype.
Unless stated otherwise, after a change, the whole population is re-evaluated
before the algorithm is presumed. The genotypes are kept unless the change cre-
ates phenotypically identical individuals, in which case duplicates are randomized.
As a performance measure, we use the error to the optimum. We use glpk [5] for
calculating the surrogate multipliers for the WC and CPLEX for calculating the
true optimum for each environment. All results are averages over 50 runs with
dierent random seeds but on the same series of environment changes.
Note that the following analysis assumes that the evaluation is by far the
most time-consuming operation (as is usual for many practical optimization
problems), allowing us to ignore the computational overhead caused by the de-
coders.
wc
permutation
1000 binary
Error
100
10
0 50000 100000
Evaluations
change is indicated with (x) for the permutation approach and (o) for the WC
approach. For further details see also Table 1.
Several interesting observations can be made. First, as expected, there is a
signicant increase in error right after a change. Nevertheless, the error after a
change is much smaller than the error at the beginning of the run. Compared to
the rst environment, the average error of the starting solution in environments
2-10 is reduced by approximately 75% for WC, 71% for permutation and 96%
for the binary representation with penalty. This means that all representations
benet dramatically from transferring solutions from one environment to the
next. WC starts better than the permutation representation, and both indirect
Table 1. Average error or initial solution, the solution right before change, and the
solution right after a change, standard error
WC permutation binary
initial solution 6307133 7471140 15764226418421
before change 19710 26015 83433
after change 148267 220194 57722647984
4000
WC
3500 permutation
3000 penalty
WC after change
2500 permutation after change
Error
2000
1500
1000
500
0
0 50000
Evaluations
representations are much better than the binary one. The binary representation
with penalty can not prevent the solutions to become infeasible, but recovers
quickly. It clearly benets most from re-using information, and it can improve
its performance over several environmental periods (not only from the rst to
the second environment).
Second, starting from better solutions, the algorithms are able to nd better
solutions throughout the stationary periods. The benet seems highest for the
binary representation, while the permutation approach can improve performance
only a little bit. At the end of the 10th environmental period (evaluation 50,000),
the solution quality reached by the indirect representations is close to the error
found after 50,000 evaluations in a stationary environment. This means that the
algorithms dont get stuck at a previously good but now inferior solution.
Third, as in the stationary case, the WC representation outperforms the per-
mutation representation after a while and performs best overall.
3.3 Restart
Instead of continuing the EA run after a change, one might also re-start the
EA with a new, randomized population, which is a common strategy to avoid
premature convergence of the population. In this case, improving the solution
quality fast would be even more important. As Figure 3 shows, re-initializing
the population after a change is worse than simply continuing for all three rep-
4000 4000
3500 3500
3000 3000
2500 2500
Error
Error
2000 2000
1500 1500
1000 1000
500 500
0 0
0 10000 20000 30000 40000 50000 0 10000 20000 30000 40000 50000
Evaluations Evaluations
(a) (b)
4000
continued
3500 re-initialized
3000
2500
Error
2000
1500
1000
500
0
0 10000 20000 30000 40000 50000
Evaluations
(c)
Fig. 3. Error over time for dierent representations in a dynamic environment. Com-
parison of keeping the population (solid line) or re-start (dashed line) after a change
for the (a) WC (b) permutation and (c) binary representation.
772 J. Branke, M. Orbay, and S. Uyar
3.4 Hypermutation
Hypermutation [4] has been suggested as a compromise between complete restart
and simply continuing evolution. With hypermutation, the mutation probabil-
ity is increased for a few iterations after a change to re-introduce diversity. For
our experiments, we tried to increase mutation in such a way that it would
have similar eects for all representations. For the binary representation, we
increased the probability of mutation to pm = 0.05 per gene. For WC, we in-
creased the standard deviation for the Gaussian mutation to = 5. For the
permutation representation, we applied insert mutation 5 times to each newly
generated individual. In our tests, hypermutation had little eect except if the
WC representation is used (not shown). Only for WC, hypermutation helped
to speed up tness convergence signicantly in the rst period, which indicates
that either the mutation step size or the area for initialization have been chosen
too small in the rst environment.
4000
WC
3500 permutation
3000 penalty
WC after change
2500 permutation after change
Error
2000
1500
1000
500
0
0 5000 10000 15000 20000
Evaluations
Fig. 4. Error over time for dierent representations in a dynamic environment with a
change every 2000 evaluations
The Role of Representations in Dynamic Knapsack Problems 773
4000
wc
3500 permutation
3000 binary
WC after change
2500 permutation after change
Error
2000
1500
1000
500
0
0 10000 20000 30000 40000 50000
Evaluations
Fig. 5. Error over time for dierent representations in a highly severe dynamic
environment
the previous case with lower change frequency. The binary representation keeps
improving over all 10 environmental periods.
i.e. the average error over evaluations 5000-20000. This interval has been chosen
because it was covered in all experiments, and removes the inuence from the
initial warm-up phase.
In the stationary case, WC representation performs best, with permutation a
close second (+39%), and binary representation with more than four times the
oine error of the two indirect representations. In a dynamic environment, when
the algorithm is restarted after every change, the permutation representation
benets from its fast tness convergence properties and performs best, while
the binary representation improves so slowly and generates so many infeasible
solutions that it is basically unusable. If the algorithms are allowed to keep
the individuals after a change, they all work much better than restart. With
increasing change frequency or severity, the performance of all approaches suers
somewhat, but the gap between the best-performing WC representation and
the direct binary representation increases from 532% in the dynamic baseline
scenario to 867% in the high frequency scenario and 3233% in the high severity
scenario.
WC permutation binary
stationary 179.1 248.1 947.8
restart 1581.6 1115.2 625746.0
dynamic base 342.2 470.4 1823.0
high frequency 397.1 561.5 3445.7
high severity 456.4 621.6 14756.9
Acknowledgements
We would like to thank Erdem Salihoglu and Michael Stein for their help re-
garding glpk and CPLEX.
References
1 Introduction
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 776787, 2006.
c Springer-Verlag Berlin Heidelberg 2006
The Eect of Building Block Construction 777
work on sl-hdfs [6] [7] [8], here we explore the eect of dierent types of build-
ing blocks on the GA by manipulating the way in which intermediate building
blocks within the sl-hdfs are constructed. Moreover, whereas in past papers we
often examined the eect of the severity of change within a single environment
type, here we compare the eect of dierent environment types while holding
the severity of change relatively constant.
We begin by describing the sl-hdfs and three variants. We then describe the
experiments with these functions, examine the behavior of the GA, discuss the
results and draw some conclusions.
ls
the elementary schemata length is set to l = 10 = 50, where ls is the overall
length of the string. This relationship is taken from Holland [4].
specied by the intermediate weight mean and variance. If the neighbor con-
struction routine is specied, then the intermediate schemata are left alone and
instead new weights are drawn for the intermediate schemata. Thus when the
neighbor construction routine is used the only thing that changes during the
shakes of the ladder are the weights, and therefore this is sometimes called
shaking by weight. When the random construction routine is used then the
whole form of the ladder changes and thus this is sometimes called shaking by
form.
w is the fraction of intermediate schemata weights that are changed when the
ladder is shaken. This parameter only makes sense with the neighbor intermedi-
ate schemata construction method, since in the random method all weights are
changed when the ladder is shaken. However in the rst two variants described
below (Clis and Smooth), the variance of the weights is 0, thus the weights are
all the same. In the last variant (Weight) w = 1 and the neighbor construction
method is utilized, which results in shaking all the weights every time.
make this variant as close to Hollands as possible we also restricted the length
of the elementary building blocks. This restricted, shaking by weight landscape
became the Weight variant.
The main dierences between these variants as described in the preceding
paragraphs are detailed in Table 1. We will describe each of these variants in
turn in the following sections.
Fig. 1. Illustration of Shaky Ladder Construction and the Process of Shaking the
Ladder for the Clis Variant
The Eect of Building Block Construction 781
As to the rest of the parameters, the length of the schemata is not specied.
The order of the schemata is set to 8. The length of the string is 500. The number
of elementary schemata is 50. The mean of the intermediate schemata weight is
3 with a variance of 0. The construction method for the intermediate schemata
is unrestricted and random.
Fig. 2. Illustration of Shaky Ladder Construction and the Process of Shaking the
Ladder for the Smooth Variant
This variant is called the Smooth sl-hdf because unlike the sl-hdf with Clis,
there are no sharp edges in the landscape. Instead elementary schemata are
combined to form intermediate schemata, which then are combined to form the
next level of intermediate schemata and so forth.
To fully specify the parameters, the length of the schemata is not specied.
The order of the schemata is set to 8. The length of the string is 500. The number
of elementary schemata is 50. The mean of the intermediate schemata weight is
3 with a variance of 0. The construction method for intermediate schemata is
restricted and random.
Fig. 3. Illustration of Shaky Ladder Construction and the Process of Shaking the
Ladder for the Weight Variant
the distribution (and variance) of the results, Figure 4 illustrates both the tness
of the best individual in the population (Best Performance) and the average
tness of the population (Average Performance) for the three t values, averaged
across 30 runs, presented every tenth generation.
0.9
Best and Average Fitness (Avg. Over 30 Runs)
0.8
0.7
0.6
0.5
0.4
Best t = 1
Avg. t = 1
0.3 Best t = 100
Avg. t = 100
Best t = 1801
Avg. t = 1801
0.2
0.1
0
0 200 400 600 800 1000 1200 1400 1600 1800
Generations
These results show that the regularly changing environment is able to outper-
form the static environment in the long run. Initially the dynamic environment
under-performs the static environment but, before halfway through the run, the
dynamic environment has achieved a superior tness in both the best individuals
and the average tness of the populations. This is because the regularly changing
environment prevents the GA from being locked into particular building blocks
and forces it to explore a wider range of schemata. In the future, we hope to
substantiate this hypothesis. For instance, the particular schemata located in
each individual in the population could be examined.
In addition, Figure 4 shows that the Best and Average Performance results for
the constantly changing environment surpass the static environment. It also is
interesting to note that the gap between the performance of the best and average
individuals in the constantly changing environments appears to be larger than
it is in the other two environments. This is due to the fact that the constant
dynamism of the t = 1 environment means that more individuals receive a 0
score in every generation than in the other two environments, thus driving down
the Average Performance of the system relative to the Best Performance.
In a similar way, when the ladder is shaken in the regularly changing en-
vironment, the Average Performance of the system falls farther than the Best
Performance of the system. This makes sensewhen the ladder is shaken many
of the individuals that were being rewarded before lose those rewards and hence
their tness falls greatly; however it is reasonable to suppose that there are
784 W. Rand and R. Riolo
0.9
0.7
0.6
0.5
0.4
0.3
0.2
Best t = 100
0.1 Avg. t = 100
Best t = 1801
Avg. t = 1801
0
0 200 400 600 800 1000 1200 1400 1600 1800
Generations
some individuals (the new best individuals) immediately after a shake that have
a higher performance than they did before the shake and thus they mitigate the
fall of Best Performance.
In Figure 5, the results have been pared down to show the Best and Average
Performance in the changing and static regimes2 . Again, it is clear that a GA in
a regularly changing environment is able to outperform a GA in a static envi-
ronment in the long run. In this variant, the changing environment outperforms
the static environment earlier on, but the GA does not perform as well as in the
Clis environment. This is because the landscape changes in this variant are not
rough enough to prevent the premature convergence [8].
In addition, like in the Clis variant, the Average Performance of the system
falls farther than the Best Performance during a shake. This eect is due to
the fact that individuals that were not the best in the past become the best
after a shake, mitigating the decrease even though the population on average
loses more tness. Moreover it is interesting to note that the performance hits
at these shakes do not appear to be as dramatic as they are in the Clis variant,
for either the Average or Best Performance. This is one piece of evidence that
shakes are not as violent in the Smooth variant as they are in the Clis variant.
In Figure 6, the results again have been pared down to show the Best and
Average Performance of the GA in the changing and static regimes, and as
2
t = 1 is not presented since its behavior is qualitatively similar to t = 100.
The Eect of Building Block Construction 785
1
Best and Average Fitness (Avg. Over 30 Runs)
0.8
0.6
0.4
0.2
Best t = 100
Avg. t = 100
Best t = 1801
Avg. t = 1801
0
0 200 400 600 800 1000 1200 1400 1600 1800
Generations
References
1. Branke, J.: Evolutionary Optimization in Dynamic Environments. Kluwer Academic
Publishers (2001)
2. Stanhope, S.A., Daida, J.M.: Optimal mutation and crossover rates for a genetic
algorithm operating in a dynamic environment. In: Evolutionary Programming VII.
Number 1447 in LNCS, Springer (1998) 693702
3. Rand, W., Riolo, R.: Shaky ladders, hyperplane-dened functions and genetic al-
gorithms: Systematic controlled observation in dynamic environments. In Roth-
lauf, F.e.a., ed.: Applications of Evolutionary Computing, Evoworkshops: EvoBIO,
EvoCOMNET, EvoHot, EvoIASP, EvoMUSART, and EvoSTOC. Volume 3449 of
Lecture Notes In Computer Science., Springer (2005)
4. Holland, J.H.: Building blocks, cohort genetic algorithms, and hyperplane-dened
functions. Evolutionary Computation 8 (2000) 373391
5. Whitley, D., Rana, S.B., Dzubera, J., Mathias, K.E.: Evaluating evolutionary algo-
rithms. Articial Intelligence 85 (1996) 245276
6. Rand, W., Riolo, R.: The problem with a self-adaptive mutation rate in some
environments: A case study using the shaky ladder hyperplane-dened functions.
In Beyer, H.G.e.a., ed.: Proceedings of the Genetic and Evolutionary Computation
Conference, GECCO-2005, New York, ACM Press (2005)
7. Rand, W., Riolo, R.: Measurements for understanding the behavior of the ge-
netic algorithm in dynamic environments: A case study using the shaky ladder
hyperplane-dened functions. In Beyer, H.G.e.a., ed.: Proceedings of the Genetic
and Evolutionary Computation Conference, GECCO-2005, New York, ACM Press
(2005)
8. Rand, W.: Controlled Observations of the Genetic Algorithm in a Changing En-
vironment: Case Studies Using the Shaky Ladder Hyperplane-Dened Functions.
PhD thesis, University of Michigan (2005)
Associative Memory Scheme for Genetic
Algorithms in Dynamic Environments
Shengxiang Yang
1 Introduction
Genetic algorithms (GAs) have been applied to solve many optimization prob-
lems with promising results. Traditionally, the research and application of GAs
have been focused on stationary problems. However, many real world opti-
mization problems are actually dynamic optimization problems (DOPs) [4]. For
DOPs, the tness function, design variables, and/or environmental conditions
may change over time due to many reasons. Hence, the aim of an optimization
algorithm is now no longer to locate a stationary optimal solution but to track
the moving optima with time. This challenges traditional GAs seriously since
they cannot adapt well to the changing environment once converged.
In recent years, there has been a growing interest in investigating GAs for
DOPs. Several approaches have been developed into GAs to address DOPs,
such as diversity maintaining and increasing schemes [5, 7, 11], memory schemes
[2, 14, 17], and multi-population approaches [3]. Among the approaches devel-
oped for GAs in dynamic environments, memory schemes have proved to be
benecial for many DOPs. Memory schemes work by storing useful information,
either implicitly [6, 9, 12] or explicitly, from the current environment and reusing
it later in new environments. In [17, 19], a memory scheme was proposed into
population-based incremental learning (PBIL) [1] algorithms for DOPs, where
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 788799, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Associative Memory Scheme for Genetic Algorithms 789
the working probability vector is also stored and associated with the best sample
it creates in the memory. When the environment changes, the stored probability
vector can be reused in the new environment.
In this paper, the idea in [17] is extended and an associative memory scheme is
proposed for GAs in dynamic environments. For this associative memory scheme,
when the best solution of the population is stored into the memory, the current
environmental information, the allele distribution vector, is also stored in the
memory and associated with the best solution. When the environment changes,
the stored environmental information associated with the best re-evaluated mem-
ory solution is used to create new individuals into the population. Based on the
dynamic problem generator proposed in [16, 18], a series of DOPs with dier-
ent environmental dynamics are constructed as the test bed and experiments
are carried out to compare the performance of the proposed associative memory
scheme with traditional direct memory scheme for GAs in dynamic environ-
ments. Based on the experimental results we analyze the strength and weakness
of the associative memory over direct memory for GAs in dynamic environments.
The standard GA, denoted SGA in this paper, maintains and evolves a popu-
lation of candidate solutions through selection and variation. New populations
are generated by rst probabilistically selecting relatively tter individuals from
the current population and then performing crossover and mutation on them to
create new o-springs. This process continues until some stop condition becomes
true, e.g., the maximum allowable number of generations tmax is reached.
Usually, with the iteration of SGA, individuals in the population will eventu-
ally converge to the optimal solution(s) in stationary environments due to the
pressure of selection. Convergence at a proper pace, instead of pre-mature, may
be benecial and is expected for GAs to locate expected solutions for stationary
optimization problems. However, convergence becomes a big problem for GAs
in dynamic environments because it deprives the population of genetic diversity.
Consequently, when change occurs, it is hard for GAs to escape from the opti-
mal solution of the old environment. Hence, additional approaches, e.g., memory
schemes, are required to adapt GAs to the new environment.
The basic principle of memory schemes is to, implicitly or explicitly, store
useful information from the current environment and reuse it later in new envi-
ronments. Implicit memory schemes for GAs in dynamic environments depend
on redundant representations to store useful information for GAs to exploit dur-
ing the run [6, 9, 12]. In contrast, explicit memory schemes make use of precise
representations but split an extra storage space where useful information from
current generation can be explicitly stored and reused later [2, 10, 15].
For explicit memory there are three technical considerations: what to store
in the memory, how to update the memory, and how to retrieve the memory.
For the rst aspect, usually good solutions are stored in the memory and reused
directly when change occurs. This is called direct memory scheme. It is also
790 S. Yang
the memory. When the environment is detected changed, the stored probability
vector associated with the best re-evaluated memory sample is extracted to com-
pete with the current working probability vector to become the future working
probability vector for creating new samples.
The idea in [17, 19] can be extended to GAs for DOPs. That is, we can store
environmental information together with good solutions in the memory for later
reuse. Here, the key thing is how to represent current environment. As mentioned
before, given a problem in certain environment the individuals in the population
of a GA will eventually converge toward the optimum of the environment when
the GA progress its searching. The convergence information, i.e., allele distri-
bution in the population, can be taken as the natural representation of current
environment. Each time when the best individual of the population is stored in
the memory, the statistics information on the allele distribution for each locus,
the allele distribution vector, can also be stored in the memory and associated
with the best individual.
The pseudo-code for the GA with the associative memory, called associative
memory GA (AMGA), is shown in Fig. 1. Within AMGA, a memory of size
m = 0.1 n is used to store solutions and environmental information. Now each
memory point consists of a pair < S, D >, where S is the stored solution and
D is the associated allele distribution vector. For binary encoding (as per this
paper), the frequency of ones over the population in a gene locus can be taken
as the allele distribution for that locus.
As in DMGA, the memory in AMGA is re-evaluated every generation. If an
environmental change is detected, the allele distribution vector of the best mem-
ory point < SM (t), D M (t) >, i.e., the memory point with its solution SM (t)
having the highest re-evaluated tness, is extracted. And a set of (n m)
new individuals are created from this allele distribution vector DM (t) and ran-
domly swapped into the population. Here, the parameter [0.0, 1.0], called
associative factor, determines the number of new individuals and hence the im-
pact of the associative memory to the current population. Just as sampling a
probability vector in PBIL algorithms [1], a new individual S = {s1 , , sl } is
created by D M (t) = {dM M
1 , , dl } (l is the encoding length) as follows:
1, if rand(0.0, 1.0) < dM
i
si = (1)
0, otherwise
is also shown in Fig. 1. DAMGA diers from AMGA only as follows. After new
individuals have been created and swapped into the population, the original
memory solutions M (t) are merged with the population to select n m best
ones as the interim population to go though standard genetic operations.
a parameter) for environmental period k. For the rst period k = 1, M (1) is set
to a zero vector. Then, the population at generation t is evaluated as below:
5 Experimental Study
5.1 Experimental Design
Experiments were carried out to compare the performance of GAs on the dy-
namic test environments. All GAs have the following generator and parameter
settings: tournament selection with tournament size 2, uniform crossover with
pc = 0.6, bit ip mutation with pm = 0.01, elitism of size 1, and population size
n = 100 (including memory size m = 10 if used). In order to test the eect of the
associative factor on the performance of AMGA and DAMGA, is set to 0.2,
0.6, and 1.0 respectively. And the corresponding GAs are reported as -AMGA
and -DAMGA in the experimental results respectively.
For each experiment of a GA on a DOP, 50 independent runs were executed
with the same set of random seeds. For each run 5000 generations were allowed,
which are equivalent to 500, 200 and 100 environmental changes for = 10,
25 and 50 respectively. For each run the best-of-generation tness was recorded
every generation. The overall performance of a GA on a problem is dened as:
G N
1 1
F BOG = ( FBOGij ), (6)
G i=1 N j=1
where G = 5000 is the total number of generations for a run, N = 50 is the total
number of runs, and FBOGij is the best-of-generation tness of generation i of
run j. The o-line performance F BOG is the best-of-generation tness averaged
over 50 runs and then averaged over the data gathering period.
Fitness
Fitness
70 SGA 70 70 SGA
DMGA DMGA
0.2-AMGA 0.2-AMGA
60 0.6-AMGA 60 60 0.6-AMGA
1.0-AMGA 1.0-AMGA
50 50 50
0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0
Cyclic, = 25 Cyclic with Noise, = 25 Random, = 25
100 100 100
90 90 90
80 80 80
Fitness
Fitness
Fitness
70 SGA 70 SGA 70 SGA
DMGA DMGA DMGA
0.2-AMGA 0.2-AMGA 0.2-AMGA
60 0.6-AMGA 60 0.6-AMGA 60 0.6-AMGA
1.0-AMGA 1.0-AMGA 1.0-AMGA
50 50 50
0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0
Cyclic, = 50 Cyclic with Noise, = 50 Random, = 50
100 100 100
95 95 95
90 90 90
85 85 85
Fitness
Fitness
Fitness
80 80 80
SGA SGA SGA
75 DMGA 75 DMGA 75 DMGA
0.2-AMGA 0.2-AMGA 0.2-AMGA
70 0.6-AMGA 70 0.6-AMGA 70 0.6-AMGA
1.0-AMGA 1.0-AMGA 1.0-AMGA
65 65 65
60 60 60
0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0 0.1 0.2 0.5 1.0
Fig. 2, it can be seen that both DMGA and AMGAs achieve the largest perfor-
mance improvement over SGA in cyclic environments. For example, when = 10
and = 0.5, the performance dierence of DMGA over SGA, F BOG (DM GA)
F BOG (SGA), is 87.6 58.9 = 28.7, 66.5 59.8 = 6.7, and 67.0 65.5 = 1.5 un-
der cyclic, cyclic with noise, and random environments respectively. This result
indicates that the eect of memory schemes depends on the cyclicity of dynamic
environments. When the environment changes randomly and slightly (i.e., is
small), both DMGA and AMGAs are beaten by SGA. This is because under
these conditions, the environment is unlikely to return to a previous state that
is memorized by the memory scheme. And hence inserting stored solutions or
creating new ones according to the stored allele distribution vector may mislead
or slow down the progress of the GAs.
796 S. Yang
Second, comparing AMGAs over DMGA, it can be seen that AMGAs outper-
form DMGA on many DOPs, especially under cyclic environments. This happens
because the extracted memory allele distribution vector is much stronger than
the stored memory solutions in adapting the GA to the new environment. How-
ever, when is small and the environment changes randomly, AMGAs are beaten
by DMGA for most cases, see the t-test results regarding -AMGA DMGA.
This is because under these environments the negative eect of the associative
memory in AMGAs may weigh over the direct memory in DMGA.
In order to better understand the performance of GAs, the dynamic be-
haviour of GAs regarding best-of-generation tness against generations on dy-
namic OneM ax functions with = 10 and = 0.5 under dierent cyclicity
of dynamic environments is plotted in Fig. 3. In Fig. 3, the rst and last 10
environmental changes (i.e., 100 generations) are shown and the data were av-
eraged over 50 runs. From Fig. 3, it can be seen that, under cyclic and cyclic
with noise environments, after several early stage environmental changes, the
memory schemes start to take eect to maintain the performance of DMGA and
AMGAs at a much higher tness level than SGA. And the associative memory
in AMGAs works better than the direct memory in DMGA, which can be seen
in the late stage behaviour of GAs. Under random environments the eect of
memory schemes is greatly deduced where all GAs behave almost the same and
there is no clear view of the memory schemes in DMGA and AMGAs.
Third, when examining the eect of on AMGAs performance, it can be
seen that 0.6-AMGA outperforms 0.2-AMGA on most dynamic problems, see
the t-test results regarding 0.6-AMGA 0.2-AMGA. This is because increasing
the value of enhances the eect of associative memory for AMGA. However,
1.0-AMGA is beaten by 0.6-AMGA on many cases, especially when is small,
see the t-test results regarding 1.0-AMGA 0.6-AMGA. When = 1.0, all the
Associative Memory Scheme for Genetic Algorithms 797
Cyclic Cyclic
100
SGA 100
90 DMGA
Best-Of-Generation Fitness
Best-Of-Generation Fitness
0.2-AMGA
0.6-AMGA 90
80 1.0-AMGA
SGA
80 DMGA
70
0.2-AMGA
0.6-AMGA
70 1.0-AMGA
60
50 60
40 50
0 20 40 60 80 100 4900 4920 4940 4960 4980 5000
Generation Generation
Cyclic with Noise Cyclic with Noise
100 90
SGA
SGA DMGA
DMGA 85
90 0.2-AMGA
Best-Of-Generation Fitness
Best-Of-Generation Fitness
0.2-AMGA 0.6-AMGA
0.6-AMGA 80
1.0-AMGA
80 1.0-AMGA
75
70 70
65
60
60
50
55
40 50
0 20 40 60 80 100 4900 4920 4940 4960 4980 5000
Generation Generation
Random Random
80 90
SGA
75 85 DMGA
0.2-AMGA
Best-Of-Generation Fitness
Best-Of-Generation Fitness
70 80 0.6-AMGA
1.0-AMGA
65 75
60 70
55 SGA 65
DMGA
50 0.2-AMGA 60
0.6-AMGA
45 1.0-AMGA 55
40 50
0 20 40 60 80 100 4900 4920 4940 4960 4980 5000
Generation Generation
Fig. 3. Dynamic behaviour of GAs during the (Left Column) early and (Right Column)
late stages on dynamic OneM ax functions with = 10 and = 0.5
individuals in the population are replaced by the new individuals created by the
re-activated memory allele distribution vector when change occurs. This may be
disadvantageous. Especially, when is small, the environment changes slightly
and good solutions of previous environment are likely also good for the new one.
It is better to keep some of them instead of discarding them all.
In order to test the eect of combining direct memory with associative mem-
ory into GAs for DOPs, experiments were further carried out to compare the
performance of DAMGAs over AMGAs. The relevant t-test results are presented
798 S. Yang
in Table 2, from which it can be seen that DAMGAs outperform AMGAs under
most dynamic environments. However, the experiments (not shown here) indi-
cate the performance improvement of -DAMGA over -AMGA is relatively
small in comparison with the performance improvement of -AMGA over SGA.
References
1. S. Baluja. Population-based incremental learning: A method for integrating genetic
search based function optimization and competitive learning. Tech. Report CMU-
CS-94-163, Carnegie Mellon University, 1994.
2. J. Branke. Memory enhanced evolutionary algorithms for changing optimization
problems. Proc. of the 1999 Congr. on Evol. Comput., vol. 3, pp. 1875-1882, 1999.
3. J. Branke, T. Kauler, C. Schmidth, and H. Schmeck. A multi-population approach
to dynamic optimization problems. Proc. of the Adaptive Computing in Design and
Manufacturing, pp. 299-308, 2000.
4. J. Branke. Evolutionary Optimization in Dynamic Environments. Kluwer Aca-
demic Publishers, 2002.
5. H. G. Cobb and J. J. Grefenstette. Genetic algorithms for tracking changing
environments. Proc. of the 5th Int. Conf. on Genetic Algorithms, pp. 523-530,
1993.
6. D. E. Goldberg and R. E. Smith. Nonstationary function optimization using genetic
algorithms with dominance and diploidy. Proc. of the 2nd Int. Conf. on Genetic
Algorithms, pp. 59-68, 1987.
7. J. J. Grefenstette. Genetic algorithms for changing environments. Proc. of the 2nd
Int. Conf. on Parallel Problem Solving from Nature, pp. 137-144, 1992.
8. A. Karaman, S. Uyar, and G. Eryigit. The memory indexing evolutionary algorithm
for dynamic environments. EvoWorkshops 2005, LNCS 3449, pp. 563-573, 2005.
9. E. H. J. Lewis and G. Ritchie. A comparison of dominance mechanisms and simple
mutation on non-stationary problems. Proc. of the 5th Int. Conf. on Parallel
Problem Solving from Nature, pp. 139-148, 1998.
10. N. Mori, H. Kita and Y. Nishikawa. Adaptation to changing environments by
means of the memory based thermodynamical genetic algorithm. Proc. of the 7th
Int. Conf. on Genetic Algorithms, pp. 299-306, 1997.
11. R. W. Morrison and K. A. De Jong. Triggered hypermutation revisited. Proc. of
the 2000 Congress on Evol. Comput., pp. 1025-1032, 2000.
12. K. P. Ng and K. C. Wong. A new diploid scheme and dominance change mechanism
for non-stationary function optimisation. Proc. of the 6th Int. Conf. on Genetic
Algorithms, 1997.
13. C. L. Ramsey and J. J. Greenstette. Case-based initializtion of genetic algorithms.
Proc. of the 5th Int. Conf. on Genetic Algorithms, 1993.
14. A. Simoes and E. Costa. An immune system-based genetic algorithm to deal with
dynamic environments: diversity and memory. Proc. of the 6th Int. Conf. on Neural
Networks and Genetic Algorithms, pp. 168-174, 2003.
15. K. Trojanowski and Z. Michalewicz. Searching for optima in non-stationary envi-
ronments. Proc. of the 1999 Congress on Evol. Comput., pp. 1843-1850, 1999.
16. S. Yang. Non-stationary problem optimization using the primal-dual genetic algo-
rithm. Proc. of the 2003 IEEE Congress on Evol. Comput., vol. 3, pp. 2246-2253,
2003.
17. S. Yang. Population-based incremental learning with memory scheme for changing
environments. Proc. of the 2005 Genetic and Evol. Comput. Conference, vol. 1,
pp. 711-718, 2005.
18. S. Yang and X. Yao. Experimental study on population-based incremental learning
algorithms for dynamic optimization problems. Soft Computing, vol. 9, no. 11,
pp. 815-834, 2005.
19. S. Yang and X. Yao. Population-based incremental learning with associative mem-
ory for dynamic environments. Submitted to IEEE Trans. on Evol. Comput., 2005.
Bayesian Optimization Algorithms
for Dynamic Problems
1 Introduction
Evolutionary Algorithms (EAs) have been widely utilized to solve stationary
optimization problems. But many real optimization problems are actually dynamic. In
case of dynamic problems the fitness function, parameters and environmental
conditions may change over time. Most methods for handling dynamic problems
encourage higher diversity of population than conventional EAs does. The survey of
main techniques in the field is described in [1, 2]. The efficient approach is based on
Triggered Hypermutation, which uses two different scales of mutation probability
one for stationary state (0.001) and another for nonstationary state (0.3). For
achieving sufficient population diversity, Morrison [2] proposed the concept of
Sentinels subpopulation of individuals in the population not modified by the
reproduction process. These not moving individuals are also able to provide the
detection of environmental changes. A very competent overview of the current
methods for dynamic problem optimization is published in [3].
Our goal is to test the capability of the new versions of BOA algorithms including
the phenomenon of Hypervariance and Sentinels. The rest of the paper is organized as
follows. Section 2 explains MBOA and AMBOA algorithms and the newly proposed
modifications in more detail, Section 3 presents experimental results; conclusions and
future works are presented in Section 4.
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 800 804, 2006.
Springer-Verlag Berlin Heidelberg 2006
Bayesian Optimization Algorithms for Dynamic Problems 801
where {xi}j denotes the set of realizations of variable Xi among the individuals
from parent population that traverse to j-th partition, and |{xi}j | denotes the number of
such individuals. The scaling factor g is equal to 1 in MBOA. All kernels in the same
leaf have the same height 1/|{xi}j | and the same width ij:
max{ xi } j min{ xi } j
ij = . (2)
| {xi } j | 1
where p denotes the desired success rate (for Nsucc/(Nsucc + Nfail)=p it holds g+1=g).
The choice of determines how fast the desired success rate is achieved. For
increasing the adaptation is faster, but also more sensitive to oscillations. In our
experiments, we choose = e 4/N and p=0.05+0.3/n-1/2, in accordance with [5].
value of is set to a suitably large value in generation immediately after the change of
fitness function. In next generations the value of is set back to the value before the
change. Secondly, we designed Hypervariance Mixed Bayesian Optimization
Algorithm (HMBOA) which is derived from AHMBOA by skipping the adaptation of
during the optimization process.
Sentinels are permanent members of the population, uniformly distributed through the
search space. They participate on the generation of offspring, but they are not
replaced by new individuals during the tournament replacement process. In order to
distribute Sentinels uniformly in the search space we used the well known formula
published in [2].
3 Experimental Results
Our goal is to test and compare the ability of the proposed BOA versions to track the
optimum in dynamic environments. We used a simple dynamic test problem which
can be interpreted as a problem with changing fitness peak location. The performance
is measured by the Collective Mean Fitness published in [2]:
G M
EC = ( FBG ) / M / G , (4)
g =1 m =1
where EC is the Collective Mean Fitness, FBG is the fitness value of the best solution
in current generation, M is the number of runs, and G is the number of generations.
We used four levels of Sentinels in population (0, 30, 50, and 70 percent). The test
function F(X,g) is defined as a normalized sum of moving peaks:
1 8
F ( X , g ) = F ( x1 ,..., x8 , g ) = f ( xi , g ) , (5)
8 i =1
1 1
0.9 0.9
0.8
collective mean fittnes Ec
0.8
Fig. 1. The effect of movement period T for S=0%, the step-scale k=0.5 (left), and k=4.0 (right)
1 1
0.9 0.9
0.8
collective mean fittnes Ec
0.8
collective mean fittnes Ec
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 AHMBOA 0.2 AHMBOA
AMBOA AMBOA
0.1 HMBOA 0.1 HMBOA
MBOA MBOA
0 0
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
Sentinels percentage S Sentinels percentage S
Fig. 2. The effect of Sentinels for T=200, the step-scale k=0.5 (left), and k=4.0 (right)
In Fig. 1 the dependency of EC on the movement period T for two values of step-
scale k and zero percentage of Sentinels is presented. It is clear that EC is increasing
for longer movement periods. AMBOA and AHMBOA outperform HMBOA. The
effect of variance adaptation is significantly better than the effect of Hypervariance
applied only during one critical generation after the change of the fitness function. Let
us note that MBOA was not able to track the optimum in any experiment.
In Fig. 2 the positive influence of embedded Sentinels is demonstrated for all tested
algorithms in case of movement period T equal to 200 generations. It is evident that
the HMBOA algorithm outperformed AHMBOA and AMBOA for the Sentinel
percentage value greater than 30. This can be explained by a phenomenon when the
influence of Sentinels partly resulted in limited exploitation of search is balanced by
the presence of Hypervariance.
804 M. Kobliha, J. Schwarz, and J. Oenek
4 Conclusions
In the paper the performance of two variants of Bayesian optimization algorithms was
tested MBOA algorithm developed for optimization of mixed continuous discrete
problems and AMBOA algorithm which extends MBOA with variance adaptation.
Both algorithms confirmed its applicability to dynamic environment, but with the
limited minimal period of the change of fitness function.
That is why we proposed a new extension of MBOA and AMBOA algorithm with
embedded Sentinel concept. This technique contributes to the implicit adaptation of
the probabilistic model after the change of fitness function. The Sentinels concept
together with Hypervariance included in HMBOA algorithm was the best choice
resulting in significantly improved performance. The future work will be focused on
testing the performance on more complex benchmarks. We will consider more
advanced adaptation scheme for the probabilistic model building using additional
information derived from embedded Sentinels.
Acknowledgements. This research has been carried out under the financial support of
the research project FR0136/2006/G1 Evolutionary algorithms for dynamic
problems, (Ministry of Education), and the research grant GA102/06/0599 Methods
of polymorphic digital circuit design (Grant Agency of Czech Republic).
References
1. Branke, J.: Evolutionary Optimization in Dynamic Environment. University of Karlsruhe,
Germany, Kluwer Academic Publishers, 2002.
2. Morrison, R.W.: Designing Evolutionary Algorithms for Dynamic Environments. Springer
Verlag, Germany, 2004, pp.147.
3. Yaochu, J., Branke, J.: Evolutionary optimization in uncertain environmentsa survey.
IEEE Transactions on Evolutionary Computation, Volume 9, Issue 3, June 2005, pp. 303
317.
4. Ocenasek, J., Schwarz, J.: Estimation of Distribution Algorithm for mixed continuous-
discrete optimization problems. In: 2nd Euro-International Symposium on Computational
Intelligence, IOS Press, Kosice, Slovakia, 2002, pp. 227232.
5. Ocenasek, J., Kern, S., Hansen, N., Mller, S., Koumoutsakos, P.: A Mixed Bayesian
Optimization Algorithm with Variance Adaptation. Parallel Problem Solving From Nature
PPSN VIII, Springer Verlag, Berlin, 2004, pp. 352361.
Prudent-Daring vs Tolerant Survivor Selection
Schemes in Control Design of Electric Drives
F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 805809, 2006.
c Springer-Verlag Berlin Heidelberg 2006
806 F. Neri et al.
chromosome, x
r speed 1 + r s
controller K r
isd K2 x *
enc s r
+ vsd
gene, x(i)
+
tisq K3 *
abc
smoothing 1
v sq sm
dq filter 1 + sm s
isq* + r
*r tsm Kwr,twr Kisq,tisq
+ - + - + vsd
compensator
-r isq K1 K1
smoothing filter speed controller isq controller isd abc
isq dq K2
vsq
compensator r (isd K 2 + K 3 )
r isq
speed
r calculator
K3
speed rise-time, and undesired d-axis-current oscillations for each speed step j of
a training test (see [5] for details). Since each single objective function requires
a set of measurements the measurement errors aect the objective function f
which is therefore noisy.
updated and resorted according to f; f) The best performing Spop individuals are
saved for the subsequent generation.
Finally,
the value of is updated according
fbest favg
to the formula = min 1, f . The main idea behind the ATEA is
best
that if it is possible to prove that an individual, even if underestimated, is in the
top part of the list, it should not be reevaluated; analogously if an individual
is so bad that, even if overestimated, is in the bottom part of the list. In other
words, when Spop is calculated, the algorithm implicitly divides the population
in three categories: individuals surely good, individuals surely bad and individ-
uals which require a more attentive analysis. The individuals surely good or bad
do not require additional evaluations; the others require a reevaluation cycle.
Following the procedure described in [9] for Gaussian distribution, the 90% of
the tness samples falls in a tolerance interval with wideness wT I = 0.1702
with a probability = 0.99. Both the APDEA and the ATEA have been ex-
ecuted in order to minimize the tness f with nmax s = 10. Concerning the
APDEA Spru [40, 160], Sdar [0, 20] and b = 0.2; concerning the ATEA
Spop [40, 160]. Also a standard Reevaluating Genetic Algorithm (RGA) em-
ploying the same crossover and mutation techniques used for the APDEA and
the ATEA has been implemented. This RGA executes the averaging over time
[2] with a sample size nmax
s = 10 for every evaluation and makes use of a stan-
dard survivor selection which saves at each generation the prexed Spop = 100
best performing individuals. Each of the three algorithms has been executed 65
times. Each execution has been stopped after 20000 tness evaluations. Table 1
compares the best performing solutions found by the three methods and shows
the best tness values f , the average best tness f (over the 65 experiments)
and the corresponding standard deviation . The algorithmic performances and
the behavior of for the APDEA and the ATEA are shown in Fig. 3 and in
Fig. 4 respectively. The numerical results show that both the APDEA and the
ATEA converge faster than the RGA to solutions very similar among each other.
Concerning the convergence velocity, the APDEA proved to be more performing
than the ATEA. Moreover the APDEA is more general than the ATEA since
the latter makes use of the assumption that the noise is Gaussian and with a
constant in all the domain. On the other hand, the APDEA, unlike the ATEA,
max
requires the setting of b and Sdar . A wrong choice of b would lead to a too strong
max
or too weak penalization in the tness function. Analogously, Sdar determines
the audacity of the algorithm and its wrong setting could lead to either a wrong
1.8 1
APDEA
APDEA
ATEA
1.6 RGA 0.5
Average Best Fitness
1.4 0
0 20 40 60 80
Iteration
1.2 1
ATEA
1 0.5
0.8 0
0 5000 10000 15000 0 20 40 60 80
Fitness Evaluations Iteration
Fig. 3. Performances of the three algo- Fig. 4. Behavior of for the APDEA
rithms and the ATEA
References
1. Branke, J.: Evolutionary Optimization in Dynamic Enviroments. Kluwer (2001)
2. Jin, Y., Branke, J.: Evultionary optimization in uncertain enviroments-a survey.
IEEE Trans. on Evolutionary Computation 9 (2005) 303317
3. Krishnan, R.: Electronic Motor Drives: Modeling, Analysis and Control. Prentice
Hall, Upper Saddle River, New Jersey, USA (2001)
4. Leonhard, W.: Control of Electrical Drives. 2nd edn. Springer-Verlag (1996)
5. Caponio, A., Cascella, G.L., Neri, F., Salvatore, N., Sumner, M.: A fast adaptive
memetic algorithm for on-line and o-line control design of pmsm drives. To app.
IEEE Trans. on System Man and Cybernetics-B, spec. issue Memetic Alg.s (2006)
6. Whitley, D.: The genitor algorithm and selection pressure: Why rank-based alloca-
tion of reproductive trials is best. In: Proc. 3rd Int. Conf. on GAs. (1989) 116121
7. Eshelman, L.J., Shaer, J.D.: Real-coded genetic algorithms and interval-schemata.
In: Foundations of Genetic Algorithms 2, Morgan Kaufmann (1993) 187202
8. Ong, Y.S., Keane, A.J.: Meta-lamarkian learning in memetic algorithms. IEEE
Trans. on Evolutionary Computation 8 (2004) 99110
9. NIST: e-handbook of statistical methods. (www.itl.nist.gov/div898/handbook/)
Author Index