0% found this document useful (0 votes)
1 views584 pages

CST, BioInformation Processing a Primer on Computational Cognitive Science

The document is a comprehensive primer on computational cognitive science authored by James K. Peterson, focusing on the intersection of biology, mathematics, and computer science. It includes various topics such as bioinformation processing, diffusion models, neural systems, and principles of computation. The work aims to inspire readers to explore these interdisciplinary areas further.

Uploaded by

David Olivera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
1 views584 pages

CST, BioInformation Processing a Primer on Computational Cognitive Science

The document is a comprehensive primer on computational cognitive science authored by James K. Peterson, focusing on the intersection of biology, mathematics, and computer science. It includes various topics such as bioinformation processing, diffusion models, neural systems, and principles of computation. The work aims to inspire readers to explore these interdisciplinary areas further.

Uploaded by

David Olivera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 584

Cognitive Science and Technology

James K. Peterson

BioInformation
Processing
A Primer on Computational Cognitive
Science
Cognitive Science and Technology

Series editor
David M.W. Powers, Adelaide, Australia

More information about this series at http://www.springer.com/series/11554


Cowperthraite knew she had to prepare more for her class, so she settled down to read at her
favorite spot on the beach
James K. Peterson

BioInformation Processing
A Primer on Computational
Cognitive Science

123
James K. Peterson
Department of Mathematical Sciences
Clemson University
Clemson, SC
USA

ISSN 2195-3988 ISSN 2195-3996 (electronic)


Cognitive Science and Technology
ISBN 978-981-287-869-4 ISBN 978-981-287-871-7 (eBook)
DOI 10.1007/978-981-287-871-7

Library of Congress Control Number: 2015958316

© Springer Science+Business Media Singapore 2016


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by SpringerNature


The registered company is Springer Science+Business Media Singapore Pte Ltd.
I dedicate this work to the many people,
practicing scientists, mathematicians and
software developers and computer scientists,
who have who have helped me think more
carefully about the whole picture. The larger
problems are what interests me as my family
can attest as they have listened to my ideas in
the living room and over dinner for many
years. I hope these notes help inspire my
readers to consider the intersection
of biology, mathematics, and computer
science as a fertile research area.
Contents

Part I Introductory Matter


1 BioInformation Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 The Proper Level of Abstraction. . . . . . . . . . . . . . . . . . . . . . 4
1.2 The Threads of Our Tapestry . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Chapter Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Theoretical Modeling Issues. . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Part II Diffusion Models


2 The Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 The Microscopic Space–Time Evolution of a Particle . . . . . . . 19
2.2 The Random Walk and the Binomial Distribution. . . . . . . . . . 22
2.3 Rightward Movement Has Probability 0.5 or Less . . . . . . . . . 24
2.3.1 Finding the Average of the Particles Distribution
in Space and Time . . . . . . . . . . . . . . . . . . . . . .... 25
2.3.2 Finding the Standard Deviation of the Particles
Distribution in Space and Time . . . . . . . . . . . . .... 26
2.3.3 Specializing to an Equal Probability Left
and Right Random Walk . . . . . . . . . . . . . . . . . . . . . 28
2.4 Macroscopic Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Obtaining the Probability Density Function . . . . . . . . . . . . . . 30
2.5.1 p Less Than 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.2 p and q Are Equal . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 Understanding the Probability Distribution of the Particle . . . . 33
2.7 The General Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . 34
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

vii
viii Contents

3 Integral Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 The Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.1 Homework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1 Homework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 The Time Dependent Cable Solution . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 The Solution for a Current Impulse. . . . . . . . . . . . . . . . . . . . 46
4.1.1 Modeling the Current Pulses . . . . . . . . . . . . . . . . . . 46
4.1.2 Scaling the Cable Equation . . . . . . . . . . . . . . . . . . . 47
4.1.3 Applying the Laplace Transform in Time . . . . . . . . . 49
4.1.4 Applying the Fourier Transform in Space . . . . . . . . . 50
4.1.5 The T Transform of the Pulse . . . . . . . . . . . . . . . . . 50
4.1.6 The Idealized Impulse T Transform Solution . . . . . . 51
4.1.7 Inverting the T Transform Solution . . . . . . . . . . . . . 51
4.1.8 A Few Computed Results . . . . . . . . . . . . . . . . . . . . 53
4.1.9 Reinterpretation in Terms of Charge . . . . . . . . . . . . . 54
4.2 The Solution to a Constant Current. . . . . . . . . . . . . . . . . . . . 55
4.3 Time Dependent Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Part III Neural Systems


5 Mammalian Neural Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1 The Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Brain Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 The Brain Stem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 Cortical Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.1 Cortical Processing . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.2 Isocortex Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 77
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Abstracting Principles of Computation . . . . . . . . . . . . . . . . . . . . . 83
6.1 Cellular Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Dynamical Loop Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 An Implication for Biological Computation . . . . . . . . . . . . . . 90
6.4 Transport Mechanisms and Switches . . . . . . . . . . . . . . . . . . . 91
6.5 Control of a Substance via Creation/Destruction Patterns . . . . . 94
6.6 Calcium Ion Signaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.7 Modulation Pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.7.1 Ligand—Receptor Response Strategies . . . . . . . . . . . 102
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Contents ix

7 Second Messenger Diffusion Pathways. . . . . . . . . . . . . . . . . . . . . . 107


7.1 Calcium Diffusion in the Cytosol . . . . . . . . . . . . . . . . . . . . . 107
7.1.1 Assumption One: Calcium Binding Is Fast. . . . . . . . . 110
7.1.2 Assumption Two: Binding Rate Is Much Less
Than Disassociation Rate. . . . . . . . . . . . . . . . . . . . . 112
7.2 Transcriptional Control of Free Calcium . . . . . . . . . . . . . . . . 114
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8 Second Messenger Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.1 Generic Second Messenger Triggers . . . . . . . . . . . . . . . . . . . 117
8.1.1 Concatenated Sigmoid Transitions . . . . . . . . . . . . . . 121
8.2 A Graphic Model Computation Model . . . . . . . . . . . . . . . . . 122
8.3 Caþþ Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.4 Spatially Dependent Calcium Triggers. . . . . . . . . . . . . . . . . . 126
8.5 Calcium Second Messenger Pathways . . . . . . . . . . . . . . . . . . 128
8.6 General Pharmacological Inputs . . . . . . . . . . . . . . . . . . . . . . 130
8.6.1 7 Transmembrane Regions. . . . . . . . . . . . . . . . . . . . 131
8.6.2 4 Transmembrane Regions. . . . . . . . . . . . . . . . . . . . 132
8.6.3 Family Two: The Agonist Spectrum . . . . . . . . . . . . . 132
8.6.4 Allosteric Modulation of Output . . . . . . . . . . . . . . . . 133
8.7 Neurotransmitter Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9 The Abstract Neuron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.1 Neuron Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.2 Neuron Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.3 Abstract Neuron Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.3.1 Toxin Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.4 Feature Vector Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 151
9.4.1 The BFV Functional Form. . . . . . . . . . . . . . . . . . . . 152
9.4.2 Modulation of the BFV Parameters . . . . . . . . . . . . . . 155
9.4.3 Modulation via the BFV Ball and Stick Model. . . . . . 156
9.5 The Full Abstract Neuron Model . . . . . . . . . . . . . . . . . . . . . 169
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Part IV Models of Emotion and Cognition


10 Emotional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.1 The Sloman Emotional Model . . . . . . . . . . . . . . . . . . . . . . . 179
10.2 PsychoPhysiological Data . . . . . . . . . . . . . . . . . . . . . . . . . . 180
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
11 Generation of Music Data: J. Peterson and L. Dzuris . . . . . . . . . . 183
11.1 A Musical Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
11.2 The Würfelspiel Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 186
x Contents

11.3 Neutral Music Data Design . . . . . . . . . . . . . . . . . . . . . . . . . 187


11.3.1 Neutral Musical Alphabet Design . . . . . . . . . . . . . . . 187
11.3.2 The Generated Musical Phrases . . . . . . . . . . . . . . . . 189
11.4 Emotional Musical Data Design . . . . . . . . . . . . . . . . . . . . . . 191
11.4.1 Emotion and Music. . . . . . . . . . . . . . . . . . . . . . . . . 191
11.4.2 Emotional Music Data Design . . . . . . . . . . . . . . . . . 195
11.4.3 Happy Musical Data . . . . . . . . . . . . . . . . . . . . . . . . 196
11.4.4 Sad Musical Data . . . . . . . . . . . . . . . . . . . . . . . . . . 197
11.4.5 Angry Musical Data . . . . . . . . . . . . . . . . . . . . . . . . 199
11.4.6 Emotional Musical Alphabet Selection . . . . . . . . . . . 201
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
12 Generation of Painting Data: J. Peterson, L. Dzuris
and Q. Peterson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
12.1 Developing a Painting Model . . . . . . . . . . . . . . . . . . . . . . . . 205
12.2 Neutral Painting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
12.2.1 The Neutral Künsterisches Würfelspiel Approach . . . . 211
12.3 Encoding the Painting Data . . . . . . . . . . . . . . . . . . . . . . . . . 212
12.4 Emotionally Labeled Painting Data . . . . . . . . . . . . . . . . . . . . 216
12.4.1 Painting and Emotion in the Literature . . . . . . . . . . . 216
12.4.2 The Emotional Künsterisches
Würfelspiel Approach . . . . . . . . . . . . . . . . . . . . . . . 220
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
13 Modeling Compositional Design . . . . . . . . . . . . . . . . . . . . . . . . . . 227
13.1 The Cognitive Dysfunction Model Review. . . . . . . . . . . . . . . 228
13.2 Connectionist Based Compositional Design . . . . . . . . . . . . . . 231
13.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
13.2.2 Noun to Verb Processing . . . . . . . . . . . . . . . . . . . . . 234
13.2.3 Sentence Construction . . . . . . . . . . . . . . . . . . . . . . . 236
13.3 Neurobiologically Based Compositional Design . . . . . . . . . . . 237
13.3.1 Recalling Data Generation . . . . . . . . . . . . . . . . . . . . 237
13.3.2 Training the Isocortex Model . . . . . . . . . . . . . . . . . . 239
13.3.3 Sensor Fusion in Area 37 . . . . . . . . . . . . . . . . . . . . 241
13.4 Integration of the Models . . . . . . . . . . . . . . . . . . . . . . . . . . 242
13.5 Depression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
13.6 Integration into a Virtual World . . . . . . . . . . . . . . . . . . . . . . 245
13.7 Lesion Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
13.8 The Complete Cognitive Model . . . . . . . . . . . . . . . . . . . . . . 248
13.9 Virtual World Constructions . . . . . . . . . . . . . . . . . . . . . . . . 249
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
14 Networks of Excitable Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
14.1 The Basic Neurotransmitters . . . . . . . . . . . . . . . . . . . . . . . . 251
14.2 Modeling Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
14.3 Software Implementation Thoughts . . . . . . . . . . . . . . . . . . . . 256
Contents xi

14.4 How Would We Code Synapse Interaction?. . . . . . . . . . . . . . 259


14.4.1 The Catecholamine Abstraction . . . . . . . . . . . . . . . . 260
14.4.2 PSD Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 263
14.5 Networks of Neural Objects . . . . . . . . . . . . . . . . . . . . . . . . . 264
14.5.1 Chained Architecture Details . . . . . . . . . . . . . . . . . . 264
14.5.2 Modeling Neurotransmitter Interactions . . . . . . . . . . . 270
14.6 Neuron Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
15 Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
15.1 The OCOS DAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
15.1.1 Some MatLab Comments . . . . . . . . . . . . . . . . . . . . 280
15.2 Homework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
15.3 Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

Part V Simple Abstract Neurons


16 Matrix Feed Forward Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 287
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
16.2 Minimizing the MFFN Energy . . . . . . . . . . . . . . . . . . . . . . . 290
16.3 Partial Calculations for the MFFN . . . . . . . . . . . . . . . . . . . . 291
16.3.1 The Last Hidden Layer . . . . . . . . . . . . . . . . . . . . . . 291
16.3.2 The Remaining Hidden Layers . . . . . . . . . . . . . . . . . 292
16.4 The Full Backpropagation Equations for the MFFN . . . . . . . . 295
16.5 A Three Layer Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
16.5.1 The Output Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 296
16.5.2 The Hidden Layer. . . . . . . . . . . . . . . . . . . . . . . . . . 297
16.5.3 The Input Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
16.6 A MatLab Beginning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
16.7 MatLab Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
16.7.1 Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
16.7.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
16.7.3 Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
16.7.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
16.8 Sample Training Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . 309
16.8.1 Approximating a Step Function . . . . . . . . . . . . . . . . 309
16.8.2 Approximating sin2 . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.8.3 Approximating sin2 Again: Linear Outputs . . . . . . . . 312
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
17 Chained Feed Forward Architectures . . . . . . . . . ...... . . . . . . . 315
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . ...... . . . . . . . 315
17.2 Minimizing the CFFN Energy . . . . . . . . . . ...... . . . . . . . 319
17.3 Partial Derivative Calculation in Generalized Chains . . . . . . . . 319
xii Contents

17.4 Partial Calculations for the CFFN . . . . . . . . . . . . . . . . . . . . . 324


17.4.1 The oY
j
oY i Calculation . . . . . . . . . . . . . . . . . . . . . . . . . 326
17.4.2 The Internal Parameter Partial Calculations . . . . . . . . 326
17.5 Simple MatLab Implementations . . . . . . . . . . . . . . . . . . . . . 328
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

Part VI Graph Based Modeling In Matlab


18 Graph Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
18.1 Building Global Graph Objects One . . . . . . . . . . . . . . . . . . . 337
18.1.1 Vertices One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
18.1.2 Edges One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
18.1.3 A First Graph Class . . . . . . . . . . . . . . . . . . . . . . . . 344
18.2 Adding Class Methods First Pass . . . . . . . . . . . . . . . . . . . . . 345
18.2.1 Adding Edge Methods. . . . . . . . . . . . . . . . . . . . . . . 345
18.2.2 Adding Vertices Methods . . . . . . . . . . . . . . . . . . . . 346
18.2.3 Adding Graph Methods . . . . . . . . . . . . . . . . . . . . . . 346
18.2.4 Using the Methods . . . . . . . . . . . . . . . . . . . . . . . . . 348
18.2.5 Adding a Graph to an Existing Graph . . . . . . . . . . . . 351
18.2.6 Drawing Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
18.2.7 Evaluation and Update Strategies . . . . . . . . . . . . . . . 356
18.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
18.4 Polishing the Training Code. . . . . . . . . . . . . . . . . . . . . . . . . 371
18.5 Comparing the CFFN and MFFN Code. . . . . . . . . . . . . . . . . 379
18.6 Handling Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
18.7 Lagged Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
18.8 Better Lagged Training! . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
18.9 Improved Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . 406
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
19 Address Based Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
19.1 Graph Class Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
19.1.1 Vertices Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
19.1.2 Edges Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
19.1.3 Graph Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
19.2 Class Methods Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
19.2.1 Add Location Methods . . . . . . . . . . . . . . . . . . . . . . 430
19.2.2 Add Edge Methods . . . . . . . . . . . . . . . . . . . . . . . . . 431
19.2.3 Add Node Methods. . . . . . . . . . . . . . . . . . . . . . . . . 431
19.2.4 Finding the Incidence Matrix . . . . . . . . . . . . . . . . . . 432
19.2.5 Get the Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . 441
19.3 Evaluation and Update Strategies in Graphs . . . . . . . . . . . . . . 441
19.4 Adding Inhibition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Contents xiii

20 Building Brain Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461


20.1 Build A Cortex Module. . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
20.1.1 Build A Cortical Can . . . . . . . . . . . . . . . . . . . . . . . 462
20.1.2 Build A Cortical Column. . . . . . . . . . . . . . . . . . . . . 464
20.1.3 Build A Cortex Sheet . . . . . . . . . . . . . . . . . . . . . . . 467
20.2 Build A Thalamus Module . . . . . . . . . . . . . . . . . . . . . . . . . 471
20.3 Build A MidBrain Module. . . . . . . . . . . . . . . . . . . . . . . . . . 474
20.4 Building the Brain Model . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491

Part VII Models of Cognition Dysfunction


21 Models of Cognitive Dysfunction. . . . . . . . . . . . . . . . . . . . . . . . . . 495
21.1 Cognitive Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
21.1.1 Training Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 499
21.2 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
21.2.1 Laplacian Updates . . . . . . . . . . . . . . . . . . . . . . . . . 499
21.2.2 Module Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
21.2.3 Submodule Two Training . . . . . . . . . . . . . . . . . . . . 504
21.2.4 Submodule One Training . . . . . . . . . . . . . . . . . . . . . 507
21.3 A Normal Brain Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
21.3.1 The Cognitive Dysfunction Model . . . . . . . . . . . . . . 513
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

Part VIII Conclusions


22 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

Part IX Background Reading


23 Background Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
23.1 The Central Nervous System . . . . . . . . . . . . . . . . . . . . . . . . 525
23.2 Information Theory, Biological Complexity and Neural
Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
23.3 Nervous System Evolution and Cognition . . . . . . . . . . . . . . . 526
23.4 Comparative Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
23.5 Neural Signaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
23.6 Gene Regulatory Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . 530
23.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
23.8 Theoretical Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
List of Figures

Figure 1.1 Software–Hardware–Wetware Triangle . . . . . . . . . . . . . .. 4


Figure 2.1 Walking in a volume box . . . . . . . . . . . . . . . . . . . . . . .. 20
Figure 2.2 The random walk of a particle . . . . . . . . . . . . . . . . . . . .. 22
Figure 2.3 Normal distribution: spread depends on standard
deviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33
Figure 2.4 Skewed random walk probability distribution:
p is 0.1666 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Figure 2.5 The generic diffusion solution behavior . . . . . . . . . . . . . . 36
Figure 2.6 The generic diffusion solution maximum . . . . . . . . . . . . . 37
Figure 4.1 One time dependent pulse . . . . . . . . . . . . . . . . . . . . . . . . 53
Figure 4.2 Two time dependent pulses . . . . . . . . . . . . . . . . . . . . . . . 54
Figure 4.3 Summed voltage at 3 time and 10 space constants . . . . . . . 55
Figure 5.1 The main cortical subdivisions of the brain . . . . . . . . . . . . 63
Figure 5.2 The major limbic system structures with arrows indicat-
ing structural connections . . . . . . . . . . . . . . . . . . . . . . .. 65
Figure 5.3 A simplified path of information processing in the brain.
Arrows indicate information processing pathways. . . . . . .. 66
Figure 5.4 Brain structures: the numeric labels correspond
to structures listed in Table 5.2 . . . . . . . . . . . . . . . . . . .. 67
Figure 5.5 Brain slice 1 details. a Slice 1 orientation. b Neural slice
1 cartoon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Figure 5.6 Brain slice 2 details. a Slice 2 orientation. b Neural slice
2 cartoon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Figure 5.7 The brainstem layers . . . . . . . . . . . . . . . . . . . . . . . . . .. 70
Figure 5.8 The brainstem structure. . . . . . . . . . . . . . . . . . . . . . . . .. 70
Figure 5.9 The medulla cross-sections. a The caudal medulla: slice
1. b The rostral medulla: slice 2 . . . . . . . . . . . . . . . . . .. 71
Figure 5.10 The caudal and mid pons slices. a The caudal pons: slice
3. b The mid pons: slice 4 . . . . . . . . . . . . . . . . . . . . . .. 71

xv
xvi List of Figures

Figure 5.11 The rostral pons and caudal midbrain cross-sections.


a The rostral pons: slice 5. b The caudal midbrain:
slice 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Figure 5.12 The rostral midbrain: slice 7 . . . . . . . . . . . . . . . . . . . . . . 72
Figure 5.13 Typical neuron in the reticular formation . . . . . . . . . . . . . 72
Figure 5.14 The dopamine innervation pathway . . . . . . . . . . . . . . . . . 73
Figure 5.15 The dopamine and serotonin releasing neurons sites.
a Location of dopamine sites. b Location of serotonin
sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74
Figure 5.16 The serotonin innervation pathway . . . . . . . . . . . . . . . . .. 74
Figure 5.17 Cortical folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75
Figure 5.18 Cortical lobes. a Cortical lobes. b The limbic lobe inside
the cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75
Figure 5.19 Generic overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 76
Figure 5.20 The structure of a cortical column . . . . . . . . . . . . . . . . .. 78
Figure 5.21 OCOS and FFP circuits. a The on-center, off-surround
control structure. b The folded feedback pathway control
structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 79
Figure 5.22 Layer six connections. a The layer six–four connections
to layer two–three are another FFP. b The layer six to
thalamic OCOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Figure 5.23 The OCOS/FFP cortical model . . . . . . . . . . . . . . . . . . . . 80
Figure 5.24 Synchronous cortical activity . . . . . . . . . . . . . . . . . . . . . . 80
Figure 5.25 The norepinephrine innervation pathway . . . . . . . . . . . . . . 81
Figure 6.1 Protein transcription tree . . . . . . . . . . . . . . . . . . . . . . . . . 91
Figure 6.2 A simple sigmoidal threshold curve . . . . . . . . . . . . . . . . . 92
Figure 6.3 The mRNA object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Figure 6.4 rM and sPM factory controls . . . . . . . . . . . . . . . . . . . . . . 94
Figure 6.5 Generic substance M control mechanism. . . . . . . . . . . . . . 95
Figure 6.6 Control strategies. a Futile control. b Switch control . . . . . 97
Figure 6.7 Generic trigger pathway with feedback . . . . . . . . . . . . . . . 98
Figure 6.8 Monoamine and calcium pathways . . . . . . . . . . . . . . . . . . 102
Figure 8.1 Second messenger trigger . . . . . . . . . . . . . . . . . . . . . . . . 118
Figure 8.2 Some T1 binds to the genome . . . . . . . . . . . . . . . . . . . . . 118
Figure 8.3 Maximum sodium conductance control pathway . . . . . . . . 119
Figure 8.4 The first level sigmoid graph computation. . . . . . . . . . . . . 123
Figure 8.5 Sigmoid graph computations. a Second level.
b Third level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 123
Figure 8.6 Adding feedback to the maximum sodium conductance
control pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Figure 8.7 The Dendrite–Soma BFV Model . . . . . . . . . . . . . . . . . . . 134
Figure 8.8 A high level view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Figure 9.1 Prototypical action potential . . . . . . . . . . . . . . . . . . . . . . 141
List of Figures xvii

Figure 9.2 A simplified path of information processing in the brain.


Arrows indicate information processing pathways. . . . . . . . 144
Figure 9.3 Applied synaptic pulse . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Figure 9.4 The toxin families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Figure 9.5 m perturbed alpha–beta curves . . . . . . . . . . . . . . . . . . . . . 149
Figure 9.6 Generated voltage traces for the α–β toxin families . . . . . . 151
Figure 9.7 The BFV functional form . . . . . . . . . . . . . . . . . . . . . . . . 153
Figure 9.8 The product MNA 3 HNA : sodium conductances during
a pulse 1100 at 3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 157
Figure 9.9 The product MK 4 : potassium conductances during
a pulse 1100 at 3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 157
Figure 9.10 Two action potential inputs into the dendrite
subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Figure 9.11 The EPS triangle approximation. . . . . . . . . . . . . . . . . . . . 169
Figure 9.12 Cellular agent computations . . . . . . . . . . . . . . . . . . . . . . 170
Figure 10.1 The emotionally enabled avatar . . . . . . . . . . . . . . . . . . . . 176
Figure 10.2 The three tower model . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Figure 10.3 The three layer model. . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Figure 10.4 A hybrid combined emotional model architectures . . . . . . . 178
Figure 10.5 Adding meta management to the hybrid model . . . . . . . . . 179
Figure 10.6 Human response to emotionally charged picture data . . . . . 181
Figure 10.7 Emotionally charged compositional data design . . . . . . . . . 182
Figure 11.1 Musical opening phrases . . . . . . . . . . . . . . . . . . . . . . . . . 187
Figure 11.2 Musical middle phrases . . . . . . . . . . . . . . . . . . . . . . . . . 188
Figure 11.3 Musical closing phrases . . . . . . . . . . . . . . . . . . . . . . . . . 188
Figure 11.4 The neutral music matrix . . . . . . . . . . . . . . . . . . . . . . . . 189
Figure 11.5 Neutral sequences generated using the first opening
phrase, all the middle phrases and the first ending phrase.
The first column of the figure provides a label of the form
xyz where x indicates the opening used; y, the middle
phrase used; and z, the ending phrase. Thus, 131 is the
fragment built from the first opening, the third middle
and the first ending . . . . . . . . . . . . . . . . . . . . . . . . . . .. 190
Figure 11.6 Neutral sequences generated using the second opening
phrase, the first middle phrase and all the ending
phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 190
Figure 11.7 The happy music matrix . . . . . . . . . . . . . . . . . . . . . . . .. 197
Figure 11.8 Happy sequences generated using opening one,
all middle phrases and the first closing . . . . . . . . . . . . . .. 197
Figure 11.9 Happy sequences using opening two, middle phrase one
and all the closings . . . . . . . . . . . . . . . . . . . . . . . . . . .. 198
Figure 11.10 The sad music matrix . . . . . . . . . . . . . . . . . . . . . . . . . .. 198
Figure 11.11 Sad sequences using opening one, all the middle phrases
and the first closing . . . . . . . . . . . . . . . . . . . . . . . . . . .. 199
xviii List of Figures

Figure 11.12 Sad sequences using all the openings, the first middle
phrases and the second closing . . . . . . . . . . . . . . . . . . .. 199
Figure 11.13 The angry music matrix . . . . . . . . . . . . . . . . . . . . . . . .. 200
Figure 11.14 Angry sequences generated using the first opening, all
the middle phrases and the first closing . . . . . . . . . . . . .. 200
Figure 11.15 Angry sequences generated using opening two, the fourth
middle phrase and all the closings . . . . . . . . . . . . . . . . .. 201
Figure 11.16 Some angry phrases. a Middle phrase.
b Opening phrase. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 202
Figure 12.1 Two simple paintings. a This painting uses many
separate layers for the kelp. The human figure is in an
intermediate layer between kelp layers and the seadrag-
ons essentially occupy foreground layers. The back-
ground varies from a dark blue at the bottom to very light
blue, almost white, at the top. b This painting uses a
complicated background layer with many trees. The
dragon and the large tree are midground images, while
the butterflies and moths are primarily foreground in
nature. The girl is midground . . . . . . . . . . . . . . . . . . . .. 206
Figure 12.2 The background and foreground of a simple painting.
a Background of a mellow image. Note this image plane
is quite complicated. Clearly, it could be broken up
further into midground and background images. b Fore-
ground of a mellow image which is also very compli-
cated. A simpler painting using only one of the
foreground elements would work nicely also. c A mellow
painting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 208
Figure 12.3 A neutral painting. a Background. b Midground.
c Foreground. d Assembled painting . . . . . . . . . . . . . . .. 210
Figure 12.4 The neutral painting matrix . . . . . . . . . . . . . . . . . . . . . .. 210
Figure 12.5 16 Neutral compositions: background 1, midground 1,
foregrounds 1–4, background 1, midground 2,
foregrounds 1–4, background 1, midground 3,
foregrounds 1–4, background 1, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 211
Figure 12.6 16 Neutral compositions: background 2, midground 1,
foregrounds 1–4, background 2, midground 2,
foregrounds 1–4, background 2, midground 3,
foregrounds 1–4, background 2, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 211
Figure 12.7 16 Neutral compositions: background 3, midground 1,
foregrounds 1–4, background 3, midground 2,
List of Figures xix

foregrounds 1–4, background 3, midground 3,


foregrounds 1–4, background 3, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 212
Figure 12.8 16 Neutral compositions: background 4, midground 1,
foregrounds 1–4, background 4, midground 2,
foregrounds 1–4, background 4, midground 3,
foregrounds 1–4, background 4, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 212
Figure 12.9 Encoding a neutral painting. a The second background
image. b The background edges. Note there are two edge
curves. c The first midground image. d The midground
edges. e The first foreground image. f The foreground
edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 213
Figure 12.10 Approximating the edge curve. a A 10  10 portion of a
typical background image that contains part of an edge
curve. The grid from horizontal pixels 40–49 and vertical
pixels 70–79 is shown. The 10  10 block of pixels
defines a node in the coarse grid that must be labeled as
being part of the edge curve. b A typical coarse scale
edge curve in which each filled in circle represents a
10  10 block in the original image which contains part
of the fine scale edge curve. . . . . . . . . . . . . . . . . . . . . .. 213
Figure 12.11 Assembling a happy painting. a Happy background.
b Happy midground. c Happy foreground. d A happy
painting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 214
Figure 12.12 The happy painting matrix . . . . . . . . . . . . . . . . . . . . . .. 214
Figure 12.13 16 Happy compositions: background 1, midground 1,
foregrounds 1–4, background 1, midground 2,
foregrounds 1–4, background 1, midground 3,
foregrounds 1–4, background 1, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 220
Figure 12.14 16 Happy compositions: background 2, midground 1,
foregrounds 1–4, background 2, midground 2,
foregrounds 1–4, background 2, midground 3,
foregrounds 1–4, background 2, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 221
Figure 12.15 16 Happy compositions: background 3, midground 1,
foregrounds 1–4, background 3, midground 2,
foregrounds 1–4, background 3, midground 3,
foregrounds 1–4, background 3, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 221
xx List of Figures

Figure 12.16 16 Happy compositions: background 4, midground 1,


foregrounds 1–4, background 4, midground 2,
foregrounds 1–4, background 4, midground 3,
foregrounds 1–4, background 4, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 222
Figure 12.17 Assembling a sad painting. a Sad background.
b Sad midground. c Sad foreground. d A sad painting . . .. 222
Figure 12.18 The sad painting matrix . . . . . . . . . . . . . . . . . . . . . . . .. 223
Figure 12.19 16 Sad compositions: background 1, midground 1,
foregrounds 1–4, background 1, midground 2,
foregrounds 1–4, background 1, midground 3,
foregrounds 1–4, background 1, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 223
Figure 12.20 16 Sad compositions: background 2, midground 1,
foregrounds 1–4, background 2, midground 2,
foregrounds 1–4, background 2, midground 3,
foregrounds 1–4, background 2, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 224
Figure 12.21 16 Sad compositions: background 3, midground 1,
foregrounds 1–4, background 3, midground 2,
foregrounds 1–4, background 3, midground 3,
foregrounds 1–4, background 3, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 224
Figure 12.22 16 Sad compositions: background 4, midground 1,
foregrounds 1–4, background 4, midground 2,
foregrounds 1–4, background 4, midground 3,
foregrounds 1–4, background 4, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 225
Figure 13.1 The sensory to output pathways . . . . . . . . . . . . . . . . . . .. 228
Figure 13.2 A generic model of the limbic system . . . . . . . . . . . . . .. 229
Figure 13.3 The musical and painting data is used to constrain the
outputs of the associative cortex and limbic system . . . . .. 230
Figure 13.4 Music or painting initialization generates fMRI and skin
conductance outputs and a full musical or painting
design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 231
Figure 13.5 fMRI and skin conductance inputs plus music or painting
initialization generate a full musical or painting design in
a given emotional modality . . . . . . . . . . . . . . . . . . . . . .. 232
Figure 13.6 Raw sentence to feature vector processing. . . . . . . . . . . .. 233
Figure 13.7 A three cortical column model. . . . . . . . . . . . . . . . . . . .. 239
Figure 13.8 A simplified version of the auditory cortex to associative
cortex processing. Only some of the interconnections are
shown. Each cortex type is modeled as four layers which
List of Figures xxi

are associated with specific time frames. The different


processing time frames are labeled 3, 0, 3, and 6 for
millisecond, seconds, days and weeks. The temporal
cortex time frames shown connect to limbic system
represented by the cingulate gyrus . . . . . . . . . . . . . . . . .. 242
Figure 13.9 A simplified version of the visual cortex to associative
cortex processing. Only some of the interconnections are
shown. Each cortex type is modeled as four layers which
are associated with specific time frames. The different
processing time frames are labeled 3, 0, 3, and 6 for
millisecond, seconds, days and weeks. The temporal
cortex time frames shown connect to limbic system
represented by the cingulate gyrus . . . . . . . . . . . . . . . . .. 243
Figure 13.10 Simplified connections to the avatar in a 3D virtual
world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Figure 14.1 Benzene and Catechol molecule. . . . . . . . . . . . . . . . . . . . 252
Figure 14.2 Catechol molecule with ethyl chain on C1 . . . . . . . . . . . . . 252
Figure 14.3 Dopamine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Figure 14.4 Norepinephrine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Figure 14.5 Epinephrine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Figure 14.6 The catecholamine synthesis pathway . . . . . . . . . . . . . . . . 253
Figure 14.7 Abstract neuronal process . . . . . . . . . . . . . . . . . . . . . . . . 261
Figure 14.8 Dendrite–axonal interaction pathway . . . . . . . . . . . . . . . . 262
Figure 14.9 A feedforward architecture shown as a chain
of processing elements . . . . . . . . . . . . . . . . . . . . . . . . .. 265
Figure 14.10 A chained neural architecture with self feedback loops
and general feedback . . . . . . . . . . . . . . . . . . . . . . . . . .. 265
Figure 14.11 Nodal computations. a Schematic. b Processing. . . . . . . .. 266
Figure 14.12 Local networks. a Chained feedforward.
b Local expert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 267
Figure 14.13 Postsynaptic output calculation . . . . . . . . . . . . . . . . . . .. 268
Figure 14.14 A chained neural architecture with PSD computational
sites shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 270
Figure 14.15 The dendritic tree details for Neuron N0 showing two
neurotransmitters per pre axon connection
to the dendrite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Figure 14.16 A network model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Figure 16.1 Approximating a step function using a 1-5-1 MFFN. . . . . . 310
Figure 16.2 Approximating a sin2 using a 1-5-1 MFFN . . . . . . . . . . . . 311
Figure 16.3 Approximating a sin2 using a 1-5-1 MFFN with linear
outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 313
Figure 17.1 Presynaptic error calculation . . . . . . . . . . . . . . . . . . . . .. 328
Figure 18.1 The OCOS/FFP cortical model . . . . . . . . . . . . . . . . . . .. 335
xxii List of Figures

Figure 18.2Basic cognitive software architecture design . . . . . . . . . . . 335


Figure 18.3Cognitive model components. . . . . . . . . . . . . . . . . . . . . . 336
Figure 18.4OCOS graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Figure 18.5The 1-5-1 cos energy training results . . . . . . . . . . . . . . . . 371
Figure 18.6The approximation of cos(t) on [0,3.14] using a 1-5-1
FFN graph: error is 0.1. . . . . . . . . . . . . . . . . . . . . . . . .. 378
Figure 18.7 A Folded Feedback Pathway cortical circuit in two
forms: the relabeled graph on the right has external Input
into Node 1 from the thalamus and into Node 9 from the
cortical column above. a The Folded Feedback
Pathway DG. b The Relabeled Folded Feedback
Pathway DG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 382
Figure 18.8 The Folded Feedback Pathway as a lagged architecture . .. 383
Figure 18.9 Adding column to column feedback to the original
Folded Feedback Pathway as a lagged architecture . . . . . .. 384
Figure 18.10 A modified OCOS with self feedback and feedback . . . . .. 387
Figure 18.11 The plot of the OCOS/FFP with column feedback
graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 388
Figure 18.12 The plot of the doubled OCOS/FFP with column
feedback turned into a feedforward edge . . . . . . . . . . . . .. 389
Figure 19.1 Typical OCOS graph . . . . . . . . . . . . . . . . . . . . . . . . . .. 444
Figure 20.1 Two components of a can circuit. a The OCOS circuit.
b The FFP circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 462
Figure 20.2 The last component of the can circuit and the can circuit
itself. a Two-three circuit. b A can circuit. . . . . . . . . . . . . 464
Figure 20.3 A column circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
Figure 20.4 A cortex circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Figure 20.5 A two copy ROCOS Thalamus circuit . . . . . . . . . . . . . . . 474
Figure 20.6 A MidBrain circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Figure 20.7 A simple brain model not showing input and output
connections. The edge connections are labeled Eαβ where
α and β are the abbreviations for the various modules . . .. 477
Figure 20.8 The two sensory cortex modules and the associative
cortex module with intermodule links . . . . . . . . . . . . . . .. 483
Figure 20.9 A Brain model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 490
Figure 21.1 A simple cognitive processing map focusing on a few
salient modules. The edge connections are of the form
Eα;β where α and β can be a selection from v (Visual),
a (Auditory), A (Associative), T (Thalamus),
M (Midbrain) and I (Input) . . . . . . . . . . . . . . . . . . . . . . . 510
Figure 21.2 Neural processing for neutral auditory data . . . . . . . . . . . . 511
Figure 21.3 Neural processing for sad auditory data . . . . . . . . . . . . . . 511
Figure 21.4 Neural processing for new data . . . . . . . . . . . . . . . . . . . . 512
List of Tables

Table 2.1 Comparing paths and rightward movements . . . . . . . . . . . . . 24


Table 5.1 Brain reorganizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Table 5.2 Information processing components. . . . . . . . . . . . . . . . . . . 67
Table 5.3 Cortical column cell types . . . . . . . . . . . . . . . . . . . . . . . . . 78
Table 9.1 Toxin conductance signature . . . . . . . . . . . . . . . . . . . . . . . 147
Table 9.2 Toxin α–β signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Table 15.1 Computational node count for small brain models. . . . . . . . . 283
Table 17.1 FFN evaluation process . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Table 17.2 Recursive chained partial calculation . . . . . . . . . . . . . . . . . . 323
Table 17.3 Recursive chained partial calculation with internal
parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Table 17.4 Backpropagation in the CFFN: one sample . . . . . . . . . . . . . 327
Table 17.5 Backpropagation for CFFN: multiple samples. . . . . . . . . . . . 328
Table 18.1 Feedback evaluation and Hebbian update algorithms . . . . . . . 383
Table 21.1 Evaluation and Hebbian update algorithms. . . . . . . . . . . . . . 499

xxiii
List of Code Examples

Listing 1.1 How to add paths to Octave . . . . . . . . . . . . . . . . . . . . . 14


Listing 1.2 Set paths in octave . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Listing 15.1 The OCOS Laplacian. . . . . . . . . . . . . . . . . . . . . . . . . . 280
Listing 15.2 Sample Node function vector . . . . . . . . . . . . . . . . . . . . 280
Listing 15.3 Computing the graph gradient and Laplacian . . . . . . . . . 281
Listing 16.1 Initializing the nodefunctions for the 2-3-1 MFFN . . . . . . 298
Listing 16.2 The error loop for the 2-3-1 MFFN . . . . . . . . . . . . . . . . 299
Listing 16.3 Initializing and finding the error for a 2-3-1 MFFN . . . . . 299
Listing 16.4 The 2-3-1 MFFN evaluation . . . . . . . . . . . . . . . . . . . . . 300
Listing 16.5 Testing Our 2-3-1 Code . . . . . . . . . . . . . . . . . . . . . . . . 300
Listing 16.6 Initialization code mffninit.m . . . . . . . . . . . . . . . . . . . . 301
Listing 16.7 Evaluation: mffneval2.m. . . . . . . . . . . . . . . . . . . . . . . . 302
Listing 16.8 Node function initialization. . . . . . . . . . . . . . . . . . . . . . 304
Listing 16.9 Node function assignment in the update code . . . . . . . . . 305
Listing 16.10 Updating: mffnupdate.m . . . . . . . . . . . . . . . . . . . . . . . . 305
Listing 16.11 Training: mffntrain.m. . . . . . . . . . . . . . . . . . . . . . . . . . 308
Listing 16.12 A vector conditional test function . . . . . . . . . . . . . . . . . 309
Listing 16.13 Testing the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Listing 16.14 Generating the plot that test our approximation . . . . . . . . 311
Listing 16.15 Approximating sin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Listing 16.16 Approximating sin2 Again: Linear Outputs . . . . . . . . . . . 312
Listing 17.1 Setting up a 2-3-1 CFFN . . . . . . . . . . . . . . . . . . . . . . . 329
Listing 17.2 The evaluation loop for a 2-3-1 CFFN . . . . . . . . . . . . . . 330
Listing 17.3 Testing our 2-3-1 CFFN. . . . . . . . . . . . . . . . . . . . . . . . 330
Listing 18.1 CoolObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Listing 18.2 Inside the CoolObject directory . . . . . . . . . . . . . . . . . . . 339
Listing 18.3 Outline of the subsref function . . . . . . . . . . . . . . . . . . . 339
Listing 18.4 The subs elements of p.c . . . . . . . . . . . . . . . . . . . . . . . 339
Listing 18.5 The p() elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Listing 18.6 CoolObject Example . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Listing 18.7 Vertices class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

xxv
xxvi List of Code Examples

Listing 18.8 The vertices subsref overloading . . . . . . . . . . . . . . . . . . 341


Listing 18.9 Simple vertices session: column vector input. . . . . . . . . . 342
Listing 18.10 Simple vertices session: row vector or cell input . . . . . . . 342
Listing 18.11 Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Listing 18.12 The edge subsref overloading . . . . . . . . . . . . . . . . . . . . 343
Listing 18.13 Simple edges session: column vectors . . . . . . . . . . . . . . 343
Listing 18.14 Simple edges session: row vectors . . . . . . . . . . . . . . . . . 343
Listing 18.15 The graphs class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Listing 18.16 The graph subsref overloading . . . . . . . . . . . . . . . . . . . 344
Listing 18.17 Access via g.v.v is impossible. . . . . . . . . . . . . . . . . . . . 344
Listing 18.18 Simple graph session . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Listing 18.19 Adding an edge to an existing edge . . . . . . . . . . . . . . . . 345
Listing 18.20 Adding a node to existing vertices object . . . . . . . . . . . . 346
Listing 18.21 Adding an edge in a global graph . . . . . . . . . . . . . . . . . 346
Listing 18.22 Adding a node in a global graph . . . . . . . . . . . . . . . . . . 347
Listing 18.23 Simple incidence matrix for global graph . . . . . . . . . . . . 347
Listing 18.24 Laplacian for global graph . . . . . . . . . . . . . . . . . . . . . . 348
Listing 18.25 Define the OCOS nodes and edges data . . . . . . . . . . . . . 348
Listing 18.26 Construct the OCOS edges and vertices object . . . . . . . . 348
Listing 18.27 Construct the OCOS graph . . . . . . . . . . . . . . . . . . . . . . 348
Listing 18.28 Create FFP by adding nodes and edges to OCOS . . . . . . 349
Listing 18.29 Verify the added nodes and edges . . . . . . . . . . . . . . . . . 349
Listing 18.30 Get Incidence Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Listing 18.31 Get Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Listing 18.32 Get OCOS and FFP Laplacian eigenvalue
and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Listing 18.33 Links data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Listing 18.34 Adding a graph to an existing graph . . . . . . . . . . . . . . . 352
Listing 18.35 Building a graph from other graphs . . . . . . . . . . . . . . . . 353
Listing 18.36 Using dot to create a graph visualization . . . . . . . . . . . . 354
Listing 18.37 The incidence to dot code: incToDot . . . . . . . . . . . . . . . 354
Listing 18.38 Generate the OCOS dot file . . . . . . . . . . . . . . . . . . . . . 354
Listing 18.39 A generated dot file . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Listing 18.40 Generate OCOS graphic as pdf file . . . . . . . . . . . . . . . . 355
Listing 18.41 Finding backward and forward sets: BFsets0. . . . . . . . . . 356
Listing 18.42 The sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Listing 18.43 The evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Listing 18.44 The Hebbian code: HebbianUpdateSynapticValue . . . . . . 358
Listing 18.45 An OCOS Hebbian Update. . . . . . . . . . . . . . . . . . . . . . 359
Listing 18.46 First evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Listing 18.47 Second evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Listing 18.48 A simple simulation: DoSim . . . . . . . . . . . . . . . . . . . . . 359
Listing 18.49 A 20 step evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Listing 18.50 Second evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
List of Code Examples xxvii

Listing 18.51 Initializing the Node Functions for the Chain FFN . . . . . 361
Listing 18.52 Graph evaluation code: evaluation2.m . . . . . . . . . . . . . . 362
Listing 18.53 Testing the new evaluation code . . . . . . . . . . . . . . . . . . 363
Listing 18.54 The energy calculation: energy.m . . . . . . . . . . . . . . . . . 364
Listing 18.55 Testing the Energy Code . . . . . . . . . . . . . . . . . . . . . . . 365
Listing 18.56 Returning forward edge information: Changed
BFsets code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Listing 18.57 Find the location of the absolute maximum of a vector . . . 366
Listing 18.58 Gradient Update Code . . . . . . . . . . . . . . . . . . . . . . . . . 366
Listing 18.59 Chain FFN Training Code . . . . . . . . . . . . . . . . . . . . . . 369
Listing 18.60 Sample Training Session . . . . . . . . . . . . . . . . . . . . . . . 369
Listing 18.61 Node function Initialization. . . . . . . . . . . . . . . . . . . . . . 371
Listing 18.62 Adding upper and lower bounds to the node function
initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Listing 18.63 The new evaluation function: evaluation3.m . . . . . . . . . . 373
Listing 18.64 The updated energy code: energy2.m . . . . . . . . . . . . . . . 374
Listing 18.65 The new update code: GradientUpdate2.m . . . . . . . . . . . 375
Listing 18.66 The altered training loop code: chainffntrain2.m . . . . . . . 377
Listing 18.67 Setup the training session . . . . . . . . . . . . . . . . . . . . . . . 377
Listing 18.68 Do the training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Listing 18.69 A Test File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Listing 18.70 Initializing the MFFN node functions. . . . . . . . . . . . . . . 379
Listing 18.71 Stetting up the CFFN versus MFFN test. . . . . . . . . . . . . 380
Listing 18.72 The initial energy for the CFFN and MFFN code . . . . . . 381
Listing 18.73 One gradient step for both CFFN and MFFN . . . . . . . . . 381
Listing 18.74 The incidence matrix calculation for a CFFN
with feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Listing 18.75 Converting the incidence matrix for a CFFN
with feedback into a dot file . . . . . . . . . . . . . . . . . . . . . 386
Listing 18.76 Testing the CFFN feedback incidence matrix . . . . . . . . . 386
Listing 18.77 Constructing the OCOS/FFP with column
to column feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Listing 18.78 Finding the feedback edges. . . . . . . . . . . . . . . . . . . . . . 387
Listing 18.79 The edges object subtract function . . . . . . . . . . . . . . . . . 388
Listing 18.80 The graphs object subtract function . . . . . . . . . . . . . . . . 388
Listing 18.81 Subtract the feedback edge and construct
the double copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Listing 18.82 Removing a list of edges from an edges object . . . . . . . . 391
Listing 18.83 Removing a list of edges from a graph. . . . . . . . . . . . . . 391
Listing 18.84 Extracting the feedback edge indices from the incidence
matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Listing 18.85 Extracting the new feedforward edges constructed
from the old feedback ones. . . . . . . . . . . . . . . . . . . . . . 392
Listing 18.86 New Lagged Gradient Update Code. . . . . . . . . . . . . . . . 393
xxviii List of Code Examples

Listing 18.87 New lagged training code . . . . . . . . . . . . . . . . . . . . ... 396


Listing 18.88 Example: Setting up the original graph with feedback ... 397
Listing 18.89 Example: Setting up the lagged graph . . . . . . . . . . . ... 397
Listing 18.90 Example: Further setup details and input
and target sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Listing 18.91 Example: parameters and backward and forward sets . . . . 398
Listing 18.92 Example: A simple evaluation and energy calculation . . . 399
Listing 18.93 Example: 10 training steps . . . . . . . . . . . . . . . . . . . . . . 399
Listing 18.94 Example: 10 training steps gradient information . . . . . . . 399
Listing 18.95 The thresholding algorithm . . . . . . . . . . . . . . . . . . . . . . 400
Listing 18.96 Finding out how many inputs have been classified . . . . . 400
Listing 18.97 Sample classification results . . . . . . . . . . . . . . . . . . . . . 401
Listing 18.98 Activating Line Search and Gradient Scaling . . . . . . . . . 402
Listing 18.99 New GradientUpdateLag skeleton code . . . . . . . . . . . . . 402
Listing 18.100 A Sample Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Listing 18.101 Our example run continued. . . . . . . . . . . . . . . . . . . . . . 405
Listing 18.102 Set up gradient vectors. . . . . . . . . . . . . . . . . . . . . . . . . 406
Listing 18.103 Finding the scaled gradient . . . . . . . . . . . . . . . . . . . . . . 406
Listing 18.104 Doing a descent step . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Listing 18.105 The Scaled Gradient Descent Step . . . . . . . . . . . . . . . . . 407
Listing 18.106 The Regular Gradient Descent Step . . . . . . . . . . . . . . . . 408
Listing 18.107 The new update code . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Listing 18.108 The new training code . . . . . . . . . . . . . . . . . . . . . . . . . 411
Listing 18.109 Another example setup. . . . . . . . . . . . . . . . . . . . . . . . . 411
Listing 18.110 Starting the training . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Listing 18.111 Some more training . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Listing 18.112 Testing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Listing 19.1 Address based vertices . . . . . . . . . . . . . . . . . . . . . . . . . 421
Listing 19.2 Address based vertices subsref . . . . . . . . . . . . . . . . . . . 421
Listing 19.3 Add a single node with address based vertices add . . . . . 422
Listing 19.4 Add a node list address based vertices addv . . . . . . . . . . 422
Listing 19.5 Address based edges . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Listing 19.6 Address based edges subsref . . . . . . . . . . . . . . . . . . . . . 424
Listing 19.7 Add a single edge address based edge add . . . . . . . . . . . 424
Listing 19.8 Add an edge list address based edges addv. . . . . . . . . . . 424
Listing 19.9 Setting global locations . . . . . . . . . . . . . . . . . . . . . . . . 425
Listing 19.10 Initial OCOS node addresses. . . . . . . . . . . . . . . . . . . . . 426
Listing 19.11 OCOS addresses after updated location . . . . . . . . . . . . . 426
Listing 19.12 Updated node 3 address . . . . . . . . . . . . . . . . . . . . . . . . 426
Listing 19.13 OCOS vertices data . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Listing 19.14 Accessing node 3 address . . . . . . . . . . . . . . . . . . . . . . . 427
Listing 19.15 Find OCOS global node numbers . . . . . . . . . . . . . . . . . 427
Listing 19.16 OCOS global node 3 addresses . . . . . . . . . . . . . . . . . . . 428
Listing 19.17 OCOS global node 3 addresses . . . . . . . . . . . . . . . . . . . 428
List of Code Examples xxix

Listing 19.18 Accessing OCOS edge 3 in and out data . . . . . . . . . . . . 429


Listing 19.19 Address based graph constructor graph. . . . . . . . . . . . . . 429
Listing 19.20 Address based graph constructor graph subsref . . . . . . . . 429
Listing 19.21 Address based addlocationtonodes . . . . . . . . . . . . . . . . . 430
Listing 19.22 Address based addlocationtoedges . . . . . . . . . . . . . . . . . 430
Listing 19.23 Address based addedge . . . . . . . . . . . . . . . . . . . . . . . . 431
Listing 19.24 Address based addedgev. . . . . . . . . . . . . . . . . . . . . . . . 431
Listing 19.25 Address based addnode . . . . . . . . . . . . . . . . . . . . . . . . 431
Listing 19.26 Address based addnodev . . . . . . . . . . . . . . . . . . . . . . . 431
Listing 19.27 Link2global . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Listing 19.28 Straightforward incidenceOld . . . . . . . . . . . . . . . . . . . . 433
Listing 19.29 Address2Global. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
Listing 19.30 Address2String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Listing 19.31 GetAddressMaximums . . . . . . . . . . . . . . . . . . . . . . . . . 435
Listing 19.32 FindGlobalAddresses . . . . . . . . . . . . . . . . . . . . . . . . . . 436
Listing 19.33 Incidence function form . . . . . . . . . . . . . . . . . . . . . . . . 436
Listing 19.34 Get address slot maximums . . . . . . . . . . . . . . . . . . . . . 436
Listing 19.35 Set up blank incidence matrix . . . . . . . . . . . . . . . . . . . . 436
Listing 19.36 Convert address to unique hash values . . . . . . . . . . . . . . 437
Listing 19.37 Address2Global New . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Listing 19.38 Calling Address2Global . . . . . . . . . . . . . . . . . . . . . . . . 438
Listing 19.39 Setting up the incidence matrix and closing out. . . . . . . . 438
Listing 19.40 Address based incidence. . . . . . . . . . . . . . . . . . . . . . . . 438
Listing 19.41 First, setup the OCOS graphj . . . . . . . . . . . . . . . . . . . . 439
Listing 19.42 The Address2Global Calculation . . . . . . . . . . . . . . . . . . 439
Listing 19.43 The actual B matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Listing 19.44 Sample OCOS incidence calculation . . . . . . . . . . . . . . . 440
Listing 19.45 Address based laplacian . . . . . . . . . . . . . . . . . . . . . . . . 441
Listing 19.46 BFsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
Listing 19.47 Build an OCOS Graph . . . . . . . . . . . . . . . . . . . . . . . . . 443
Listing 19.48 Build OCOS.dot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Listing 19.49 Create AddressOCOS.pdf . . . . . . . . . . . . . . . . . . . . . . . 443
Listing 19.50 BackGlobal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Listing 19.51 Access BackGlobal Entries . . . . . . . . . . . . . . . . . . . . . . 444
Listing 19.52 BackA{6} . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Listing 19.53 A Simple Evaluation Loop . . . . . . . . . . . . . . . . . . . . . . 445
Listing 19.54 A Evaluation Loop With Offsets and Gains . . . . . . . . . . 446
Listing 19.55 Build A Brain Model . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Listing 19.56 Construct the Brain Incidence Matrix . . . . . . . . . . . . . . . 447
Listing 19.57 Get Forward and Backward Information . . . . . . . . . . . . . 448
Listing 19.58 Initializing Y, O, G and W . . . . . . . . . . . . . . . . . . . . . . 448
Listing 19.59 First 5 Initial Node Values . . . . . . . . . . . . . . . . . . . . . . 448
Listing 19.60 First Graph Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 448
Listing 19.61 First 5 Node Values. . . . . . . . . . . . . . . . . . . . . . . . . . . 448
xxx List of Code Examples

Listing 19.62 The First Hebbian Update. . . . . . . . . . . . . . . . . . . . . . . 449


Listing 19.63 First 5 Nodal Values After One Hebbian Update . . . . . . . 449
Listing 19.64 Setting Up the Input Data. . . . . . . . . . . . . . . . . . . . . . . 449
Listing 19.65 Setting Up Training Data . . . . . . . . . . . . . . . . . . . . . . . 449
Listing 19.66 Testing the Training Data . . . . . . . . . . . . . . . . . . . . . . . 450
Listing 19.67 Setup OCOS Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Listing 19.68 Incidence Matrix, Backward and Forward Data
and Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
Listing 19.69 Preliminary Evaluation With Input. . . . . . . . . . . . . . . . . 451
Listing 19.70 Evaluation With Input Data . . . . . . . . . . . . . . . . . . . . . 451
Listing 19.71 Hebbian Training With an Error Signal . . . . . . . . . . . . . 452
Listing 19.72 Initialized the Neuron Data . . . . . . . . . . . . . . . . . . . . . . 452
Listing 19.73 Update loop in skeleton form . . . . . . . . . . . . . . . . . . . . 453
Listing 19.74 Target Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
Listing 19.75 HebbianUpdateErrorSignal . . . . . . . . . . . . . . . . . . . . . . 455
Listing 19.76 A Sample One Step Training Session. . . . . . . . . . . . . . . 457
Listing 19.77 Multistep Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Listing 19.78 Training for 151 Steps . . . . . . . . . . . . . . . . . . . . . . . . . 459
Listing 19.79 Y after 151 steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
Listing 20.1 Structure of buildcan . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Listing 20.2 Build the OCOS, FFP and Two/Three circuit blocks . . . . 462
Listing 20.3 Construct CanOne . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
Listing 20.4 Add the FFP and Two/Three edges . . . . . . . . . . . . . . . . 463
Listing 20.5 Connect the components. . . . . . . . . . . . . . . . . . . . . . . . 464
Listing 20.6 Structure of buildcolumn . . . . . . . . . . . . . . . . . . . . . . . 464
Listing 20.7 Building three cans . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Listing 20.8 Adding the cans to make a column . . . . . . . . . . . . . . . . 465
Listing 20.9 Add the other can edges. . . . . . . . . . . . . . . . . . . . . . . . 466
Listing 20.10 Add connections between cans . . . . . . . . . . . . . . . . . . . 466
Listing 20.11 Generate the column dot file and graphic . . . . . . . . . . . . 466
Listing 20.12 Generating the graphic with dot . . . . . . . . . . . . . . . . . . 467
Listing 20.13 The structure of the buildcortex function . . . . . . . . . . . . 467
Listing 20.14 The single case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
Listing 20.15 Constructing the column case . . . . . . . . . . . . . . . . . . . . 468
Listing 20.16 Adding nodes to the cortex module . . . . . . . . . . . . . . . . 469
Listing 20.17 Adding edges to the cortex module . . . . . . . . . . . . . . . . 469
Listing 20.18 Add the inter column connections . . . . . . . . . . . . . . . . . 470
Listing 20.19 The cortex dot file and the graphical image . . . . . . . . . . 470
Listing 20.20 Generating the Cortex figure with dot . . . . . . . . . . . . . . 470
Listing 20.21 Structure of the buildthalamus function . . . . . . . . . . . . . 471
Listing 20.22 Build the reverse OCOS nodes and edges. . . . . . . . . . . . 472
Listing 20.23 Building with one reversed OCOS block . . . . . . . . . . . . 472
Listing 20.24 Building with more than one reversed OCOS blocks . . . . 472
Listing 20.25 Glue the thalamus nodes together . . . . . . . . . . . . . . . . . 473
List of Code Examples xxxi

Listing 20.26 Add the thalamus edges . . . . . . . . . . . . . . . . . . . . . . . . 473


Listing 20.27 Build a Thalamus Module with Two Pieces . . . . . . . . . . 473
Listing 20.28 Structure of the buildmidbrain function . . . . . . . . . . . . . 474
Listing 20.29 Build nodes and edges for each neurotransmitter . . . . . . . 474
Listing 20.30 Build neurotransmitter objects . . . . . . . . . . . . . . . . . . . . 475
Listing 20.31 Glue neurotransmitter modules into the midbrain . . . . . . . 476
Listing 20.32 Generate the midbrain and its the dot file . . . . . . . . . . . . 476
Listing 20.33 Structure of the buildbrain function . . . . . . . . . . . . . . . . 477
Listing 20.34 Initialize counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
Listing 20.35 Build sensory cortex module one. . . . . . . . . . . . . . . . . . 478
Listing 20.36 Build sensory cortex module two . . . . . . . . . . . . . . . . . 478
Listing 20.37 Build the associative cortex module . . . . . . . . . . . . . . . . 479
Listing 20.38 Build the motor cortex module . . . . . . . . . . . . . . . . . . . 479
Listing 20.39 Build the thalamus module . . . . . . . . . . . . . . . . . . . . . . 479
Listing 20.40 Build the midbrain module . . . . . . . . . . . . . . . . . . . . . . 480
Listing 20.41 Build the cerebellum module . . . . . . . . . . . . . . . . . . . . 480
Listing 20.42 Glue brain modules together . . . . . . . . . . . . . . . . . . . . . 480
Listing 20.43 Add the brain module edges . . . . . . . . . . . . . . . . . . . . . 481
Listing 20.44 Connections sensory cortex one and associative cortex . . . 483
Listing 20.45 Connections sensory cortex two and associative cortex. . . 484
Listing 20.46 Connections associative cortex and motor cortex . . . . . . . 484
Listing 20.47 Connections thalamus and sensory cortex one . . . . . . . . . 485
Listing 20.48 Connections thalamus to associative and motor cortex . . . 485
Listing 20.49 Connections Thalamus to cerebellum . . . . . . . . . . . . . . . 485
Listing 20.50 Connections cerebellum to motor cortex . . . . . . . . . . . . . 485
Listing 20.51 Add intermodule connections to brain . . . . . . . . . . . . . . 486
Listing 20.52 Set neurotransmitter sizes . . . . . . . . . . . . . . . . . . . . . . . 486
Listing 20.53 Set the dopamine connections . . . . . . . . . . . . . . . . . . . . 486
Listing 20.54 Set the serotonin connections . . . . . . . . . . . . . . . . . . . . 487
Listing 20.55 Set the norepinephrine connections . . . . . . . . . . . . . . . . 488
Listing 20.56 Build a simple brain model. . . . . . . . . . . . . . . . . . . . . . 489
Listing 20.57 Generate brain incidence matrix and dot file . . . . . . . . . . 489
Listing 20.58 Generate the brain graphic . . . . . . . . . . . . . . . . . . . . . . 490
Abstract

This book tries to show how mathematics, computer science, and science can be
usefully and pleasurably intertwined. Here we begin to build a general model of
cognitive processes in a network of computational nodes such as neurons using a
variety of tools from mathematics, computational science, and neurobiology. We
begin with a derivation of the general solution of a diffusion model from a low-level
random walk point of view. We then show how we can use this idea in solving the
cable equation in a different way. This will enable us to better understand neural
computation approximations. Then we discuss neural systems in general and
introduce a fair amount of neuroscience. We can then develop many approxima-
tions to the first and second messenger systems that occur in excitable neuron
modeling. Next, we introduce specialized data for emotional content which will
enable us to build a first attempt at a normal brain model. Then, we introduce tools
that enable us to build graphical models of neural systems in MATLAB. We finish
with a simple model of cognitive dysfunction. We also stress our underlying motto:
always take the modeling results and go back to the scientists to make sure they
retain relevance. We agree with the original financial modeler’s manifesto due to
Emanuel Derman and Paul Wilmott from January 7, 2009 available on the Social
Science Research Network which takes the shape of the following Hippocratic oath.
We have changed one small thing in the list of oaths. In the second item, we have
replaced the original word value by the more generic term variables of interest
which is a better fit for our modeling interests.
• I will remember that I did not make the world, and it does not satisfy my
equations.
• Though I will use models boldly to estimate variables of interest, I will not be
overly impressed by mathematics.
• I will never sacrifice reality for elegance without explaining why I have done so.
• Nor will I give the people who use my model false comfort about its accuracy.
Instead, I will make explicit its assumptions and oversights.
• I understand that my work may have enormous effects on society and the
economy, many of them beyond my comprehension.

xxxiii
xxxiv Abstract

There is much food for thought in the above lines and all of us who strive to
develop models should remember them. In my own work, many times managers
and others who oversee my work have wanted the false comfort the oaths warn
against. We should always be mindful not to give in to these demands.
History

Based On:
Research Notes: 1992–1998
Class Notes: MTHSC 982 Spring 1997
Research Notes: 1998–2000
Class Notes: MTHSC 860 Summer Session I 2000
Class Notes: MTHSC 450, MTHSC 827 Fall 2001
Research Notes: Fall 2007 and Spring 2008
Class Notes: MTHSC 450 Fall 2008
Class Notes: MTHSC 450 Fall 2009
Class Notes: MTHSC 827 Summer Session I 2009
Class Notes: MTHSC 399 Creative Inquiry Spring 2010 and Fall 2010
Research Notes: Summer 2010 and Fall 2010
Class Notes: MTHSC 450 Spring 2010 and Fall 2010
Class Notes: MTHSC 827 Summer Session I 2010
Class Notes: MTHSC 434 Summer Session II 2010
Class Notes: MTHSC 399 Creative Inquiry Spring 2011 and Fall 2011
Class Notes: MTHSC 450 Spring 2011 and Fall 2011
Research Notes: Spring 2011 and Fall 2011
Class Notes: MTHSC 827 Summer Session I 2011
Class Notes: MTHSC 434 Summer Session II 2011
Class Notes: MTHSC 450 Spring 2012
Class Notes: BIOSC 491 Fall 2011 and Spring 2012 Research Notes: Spring 2012
Class Notes: MTHSC 827 Summer Session I 2013
Research Notes: Spring and Fall 2013
Research Notes: Spring and Fall 2014
Research Notes: Spring 2015
Class Notes: MTHSC 827 Summer Session I 2015

xxxv
Part I
Introductory Matter
Chapter 1
BioInformation Processing

In this book, we will first discuss some of the principles behind models of biological
information processing and then begin the process of using those ideas to build models
of cognition and cognitive dysfunction. Most of the mathematical tools needed for
this journey are covered in the first, second book and third book on calculus tools for
cognitive scientists, Peterson (2015a, b, c, d) and so will be either assumed or covered
very lightly here. Other mathematical ideas will be new and their coverage will be
more complete as we try to bring everyone up a notch in mathematical sophistication.
We will discuss the principles of first and second messenger systems in terms of
abstract triggering events and develop models of networks of computational nodes
such as neurons that use various approximations to the action potential generation
process. As usual, we therefore use tools at the interface between science, mathe-
matics and computer science.
Our ultimate goal is to develop infrastructure components allowing us to build
models of cognitive function that will be useful in several distinct arenas such as the
modeling of depression and other forms of mental dysfunction that require a useful
model of overall brain function. From such a model, which at the minimum involves
modules for cortex, midbrain and thalamus, motor responses and sensory subsys-
tems, we could obtain insight into how dopamine, serotonin and norepinephrine
neurotransmitters interact. It is sobering to realize that as the time of this writing,
June 2014, many drugs used for the treatment of cognitive dysfunction are prescribed
without understanding of how they work. Hence, we want to develop models of neu-
rotransmitter interaction that have predictive capability for mental disease. We are
also unapologetically interested in small brain models. How much function can we
obtain in a small package of neuronal elements? We want our models to be amenable
to lesion analysis and so we want easily accessible scripting that will let us build and
then deconstruct models as we see fit. These ideas are also needed in the development
of autonomous robotics with minimal hardware.

© Springer Science+Business Media Singapore 2016 3


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_1
4 1 BioInformation Processing

It is, of course, very difficult to achieve these goals and we are indeed far from
making it happen. Further, to make progress, our investigations must use ideas from
many fields. Above all, we must find the proper level of abstraction from the messiness
of biological detail that will give us insight. Hence, in this first volume on cognitive
modeling, we focus on the challenging task of how to go about creating a model of
information processing in the brain sufficiently simple it can be implemented and
yet sufficiently complicated to help us investigate cognition in general. We will even
build a simple model of cognitive dysfunction using multiple neurotransmitters. In
other volumes, we will use these abstractions of biological reality to build more
interesting cognitive models.

1.1 The Proper Level of Abstraction

In the field of Biological Information Processing, the usual tools used to build models
of follow from techniques that are loosely based on rather simplistic models from
animal neurophysiology called artificial neural architectures. We believe we can do
much better than this if we can find the right abstraction of the wealth of biolog-
ical detail that is available; the right design to guide us in the development of our
modeling environment. Indeed, there are three distinct and equally important areas
that must be jointly investigated, understood and synthesized for us to make major
progress. We refer to this as the Software, Wetware and Hardware SWH triangle,
Fig. 1.1. We use double-edged arrows to indicate that ideas from these disciplines
can both enhance and modify ideas from the others. The labels on the edges indicate
possible intellectual pathways we can travel upon in our quest for unification and
synthesis. Almost twenty years ago, an example of the HW leg of the triangle were
the algorithms mapped into Verilog Hardware Description Language (VHDL) used
to build hardware versions of a variety of low-level biological computational units.
In addition, there were the new concepts of Evolvable Hardware where new hard-
ware primitives referred to as Field Programmable Gate Arrays offered the ability
to program the devices Input/Output response via a bit string which could be chosen
as a consequence to environmental input (see Higuchi et al. 1997; Sanchez 1996
and Sipper 1997). There were several ways to do this: online and offline. In online

Fig. 1.1 Software–


Hardware–Wetware
Triangle
1.1 The Proper Level of Abstraction 5

strategies, groups of FPGAs are allowed to interact and “evolve” toward an appro-
priate bit string input to solve a given problem and in offline strategies, the evolution
is handled via software techniques similar to those used in genetic programming
and the solution is then implemented in a chosen FPGA. In a related approach, it
was even possible to perform software evolution within a pool of carefully chosen
hardware primitives and generate output directly to a standard hardware language
such as VHDL so that the evolved hardware can then be fabricated once an appro-
priate fitness level is reached. Thus, even 20 years ago, various areas of research
used new relatively plastic hardware elements (their I/O capabilities are determined
at run-time) or carefully reverse-engineered analog VLSI chipsets to provide us with
a means to take abstractions of neurobiological information and implement them on
silicon substrates.
Much more is possible now with the advent of synthetic biology and the con-
struction of almost entirely artificial cells. in effect, there was and still is, a blurring
between the traditional responsibilities of hardware and software for the kinds of
typically event driven modeling tasks we see here. The term Wetware in the figure is
then used to denote things that are of biological and/or neurobiological scope. And,
of course, we attempt to build software to tie these threads together.
These issues from the hardware side are highly relevant to any search for useful
ideas and tools for building biological models. Algorithms that can run in hardware
need to be very efficient and to really build useful models of complicated biology will
require large amounts of memory and computational resources unless we are very
careful. So there are lessons we can learn by looking at how these sorts of problems
are being solved in these satellite disciplines. Even though our example discussion
here is from the arena of information processing in neurobiology, our point is that
Abstraction is a principle tool that we use to move back and forth in the fertile
grounds of software, biology and hardware.
In general, we will try try hard to illuminate your journey as a reader through this
material. Let’s begin with a little philosophy; there are some general principles to
pay attention to.

1.2 The Threads of Our Tapestry

As we have mentioned, there are many threads to pull together in our research. We
should not be dismayed by this as interesting things are complex enough to warrant
it. You can take heart from what the historian Barbara Tuchman said in speaking
about her learning the art of being a historian. She was working one of her first jobs
and her editor was unhappy with her: as she says (Tuchman 1982, P. 17) in her essay
In Search of History
The desk editor, a newspaperman by training, grew very impatient with my work. ‘Don’t
look up so much material,’ he said. ‘You can turn out the job much faster if you don’t know
too much.’ While this was no doubt true for a journalist working against a deadline, it was
not advice that suited my temperament.
6 1 BioInformation Processing

There is a specific lesson for us in this anecdote. We also are in the almost unenviable
position of realizing that we can never know too much. The problem is that in addition
to mixing disciplines, languages and points of view, we must also find the right level
of abstraction for our blended point of view to be useful for our synthesis. However,
this process of finding the right level of abstraction requires reflection and much
reading and thinking.
In fact, in any endeavor in which we are trying to create a road map through a
body of material, to create a general theory that explains what we have found, we,
in this process of creating the right level of abstraction, like the historian Tuchman,
have a number of duties (Tuchman 1982, p. 17):
The first is to distill. [We] must do the preliminary work for the reader, assemble the
information, make sense of it, select the essential, discard the irrelevant—above all discard
the irrelevant—and put the rest together so that it forms a developing ... narrative. ...To
discard the unnecessary requires courage and also extra work. ...[We] are constantly being
beguiled down fascinating byways and sidetracks. But the art of writing—the test of [it]—is
to resist the beguilement and cleave to the subject.

Although our intellectual journey will be through the dense technical material of
computer science, mathematics, biology and even neuroscience, Tuchman’s advice
is very relevant. It is our job to find this proper level of abstraction and this interesting
road map that will illuminate this incredible tangle of facts that are at our disposal. We
have found that this need to explain is common in most human endeavors. In music,
textbooks on the art of composing teach the beginner to think of writing music in
terms of musical primitives like the nouns, verbs, sentences, phrases, paragraphs of a
given language where in the musical context each phoneme is a short collection of five
to seven notes. Another example from history is the very abstract way that Toynbee
attempted to explain the rise and fall of civilizations by defining the individual entities
city, state and civilization (Toynbee and Caplan 1995).
From these general ideas about appropriate abstraction, we can develop an under-
standing of which abstractions might be of good use in an attempt to develop a
working model of autonomous or cognitive behavior. We will need sophisticated
models of connections between computational elements (synaptic links between
classical lumped sum neuronal models are one such type of connection) and new
ideas for the asynchronous learning between objects. Our challenges are great as
biological systems are much more complicated than those we see in engineering or
mathematics.
These are sobering thoughts aren’t they? Still, we are confident we can shed some
insight on how to handle these modeling problems. In this volume, we will discuss
relevant abstractions that can open the doorway for you to get involved in useful and
interesting biological modeling.
Unfortunately, as we have tried to show, all of the aforementioned items are very
nicely (or very horribly!) intertwined (it depends on your point of view!). We hope
that you the reader will persevere. As you have noticed, our style in this report is to
use the “royal” we rather than any sort of third person narrative. We have always felt
1.2 The Threads of Our Tapestry 7

that this should be interpreted as the author asking the reader to explore the material
with the author. To this end, the text is liberally sprinkled with you to encourage this
point of view. Please take an active role!

1.3 Chapter Guide

This text covers the material in bioinformation processing in the following way.

Part I: Introductory Matter This is the material you are reading now. Here we
take the time to talk about modeling in general and use examples taken from the
first three courses on calculus for Cognitive Scientists (Peterson 2015a, b, c, d).
Part II: Diffusion Models In this portion of the text, we show you how to solve
the time dependent cable equation in a different way. This gives us additional
insight into what these solutions look like which helps us when we are trying to
find effective ways to approximate these computations.

• In Chap. 2, we discuss the solutions to diffusion equations with a model of a


particle moving randomly with right and left steps called a random walk model.
We use the solution of this model as as way to motivate a solution to a general
diffusion equation.
• In Chap. 3, we introduce Laplace and Fourier Transform methods so that we
can solve the cable equation in a different way.
• In Chap. 4, we show how we use integral transform techniques to find a new way
of expressing the solution to the time dependent cable equation under certain
conditions. At this point, we understand how to approximate solutions to voltage
impulses input into a dendritic model in a variety of ways.

Part III: Neural Systems In this part, we begin the real process of abstracting
biological information processing.

• In Chap. 5, we provide a top level overview of the neural structure of a typical


mammalian brain.
• In Chap. 6, we work out the details of an abstract second messenger trigger
system in terms of how the trigger initiates a sudden departure from dynamic
equilibrium. We then use this to develop simple estimates of the size to the
response to a triggering event. We also discuss a few specific examples of sec-
ond messenger events so that you can see how some of the general biological
information processing pathways have evolved to solve problems. We also dis-
cuss calcium ion current triggers and general ligand receptor response strategies
to help us understand how action potentials are used to build responses in a net-
work of interacting neurons.
• In Chap. 7, we discuss second messenger triggers that are diffusion based in
general. Our discussion of a generic model of Calcium ion triggers uses a com-
bined diffusion model with a finite number of Calcium binding molecules and
8 1 BioInformation Processing

Calcium storage in the endoplasmic reticulum. Then, we make two simplifying


assumptions and finally arrive at a model with which we can model calcium ion
trigger events to first order.
• In Chap. 8, we discuss an relatively abstract way of modeling how the output
characteristics of a first or second messenger trigger are computed.
• In Chap. 9, we look at the inputs and outputs to a neuron much more abstractly.
We use these discussions to motivate the creation of a biological feature vector
and to motivate why we think it is useful construct, we use it to map changes in
the gate parameters of a typical Hodgkin–Huxley model such as we see when
a toxin is introduced into a neural system to particular toxins. This leads to our
discussion of how the approximation of the action potential into a biological
feature vector can have useful computational advantages. We begin to see how
we could use these ideas to approximate the response of a neuron without the
large amounts of computation the previous approaches built on solving systems
of differential equations and partial differential equations require.
Part IV: Models of Emotion and Cognition We then introduce models that are
built of collections of interacting computational nodes. These network models are
actually more general than ones built from excitable neurons. We discuss some
basic ideas involving neurotransmitters and the salient issues, in our mind, that
must be addressed to build useful software models. We finish these chapters with
an introduction to how we might make a network model adapt to environment
input and given target information. To do this, we introduce a variant of the cable
equation for graphs using its Laplacian. In detail
• In Chap. 10, we introduce non biologically based emotional models so that we
can learn how to use ideas from psycho-physics.
• In Chap. 11, we show how we can build emotionally labeled models of music
which we can use to train the auditory cortex.
• In Chap. 12, we build emotionally labeled models of painting which we can use
to train the visual cortex.
• In Chap. 13, we discuss generic connectionist strategies to use the emotionally
labeled data to build a model of emotion and cognition although our true interest
is in biologically plausible models.
• In Chap. 14, we go over some basic ideas of neurotransmitters.
• In Chap. 15, we illustrate what we could do with to simulate a simple network
of neurons organized in a graph using MatLab and introduce many of the basic
ideas we have to figure out how to implement.
Part V: Simple Abstract Neurons In this part, we go over the standard model
of a matrix based feed forward network as a simplistic model of bioinformation
processing in a neural system and then segue into the more flexible paradigm that
uses a graph structure.
• In Chap. 16, we introduce the matrix feed forward network as a way to abstract
neural computation. We also show how to code the training of this model in
MatLab.
1.3 Chapter Guide 9

• In Chap. 17, we recast the matrix based networks of simple neuron processing
functions as a chain, or graph, of computational nodes. At this point, it is clear
the the neural processing can be much more interesting. We also introduce
MatLab coding of these graphs.

Part VI: Graph Based Modeling In Matlab We now show you how to build rea-
sonable graph models of neural computation using MatLab. This is not a great
choice of programming language but it is readily available and easy to learn com-
pared to C, C++ and Python.

• In Chap. 18, we build object oriented class models in MatLab (be warned, this
is clumsy and it makes us hunger for a better computing language—but we
digress....).
• In Chap. 19, we move to graph based models that are based on vector addressing:
each neuron has a global address and a vector address which we can use to find
it in a model.
• In Chap. 20, we actually build some simple brain models.

Part VII: Models of Cognitive Dysfunction In Chap. 21, we finish this text with
some preliminary conversations on how to build a model of normal cognitive
function so that we can properly discuss cognitive dysfunction. We introduce
the details of subgraph training based on the Laplacian based training briefly
introduced in Chap. 15 and show how we could use those ideas and the music and
painting data from Chaps. 11 and 12 to first build a model of a normal brain (yes,
we know that is hard to define!) and from that a model of dysfunction such as
depression. We only sketch the ideas as the implementation will take quite a long
time and deserves a long treatment. This will be done in the next volume.
Part VIII: Conclusions In Chap. 22 we talk a bit about where this volume has led
us and what we can do with the material we have learned.
Part IX: Further Reading In Chap. 23 we mention other papers and books and
software tools we, and you, might find useful.

Finally, this course is just a first step in the development of the tools needed to
build the graphs of computational nodes which subserve information processing in
the brain. The goal is to build models of cognitive dysfunction using reasonable
models of neurotransmitter action and we pursue this further in the next volume. Our
style in these lectures is to assume you the reader are willing to go on this journey
with us. So we have worked hard at explaining in detail all of our steps. We don’t
assume you have background in biophysics and the biology of neurons and so we
include chapters on this material. We develop the mathematical tools you need to a
large extent in house, so to speak. We also try to present the algorithm design and
the code behind our software experiments in a lot of detail. The interactive nature
of the MatLab/Octave integrated development environment makes it ideal for doing
the exploratory exercises we have given you in these pages. The basic background
we assume is contained in our companion books (Peterson 2015a, b, c, d).
10 1 BioInformation Processing

Finally, as we mentioned, we are interested in developing models of cognitive dys-


function which provides insight into how cognitive function arises and how cognitive
dysfunction might be ameliorated. The address based graph models (implemented
here in MatLab but it is simply an exercise to reimplement in Fortran, C, C++,
Python or a functional based language such as Erlang, Clojure or Haskell) provide
a useful theoretical framework for modeling the interactions of neurons at varying
levels of realism. We think a general graph based model, G(N, E), with nodes N
providing computation at the neuron level and edges Ei→j between nodes Ni and
node Nj giving neuronal connection computation provides a useful framework for
exploring cognitive models theoretically. In our development of a graph model of a
neural system, the neural circuitry architecture we describe is fixed, but it is clear
dynamic architectures can be handled as sequence of these directed graphs. We see
in Chap. 19 on the address based graphs and Chap. 20 on brain modeling how to
organize the directed graph using interactions between neural modules (visual cor-
tex, thalamus etc.) which are themselves subgraphs of the entire circuit. Once we
have chosen a direct graph to represent our neural circuitry, note the addition of a
new neural module is easily handled by adding it and its connections to other mod-
ules as a subgraph addition. Hence, at a given level of complexity, if we have the
graph G(N, E) that encodes the connectivity we wish to model, then the addition of
a new module or modules simply generates a new graph G  (N  , E ) for which there
are straightforward equations for explaining how G relates to G which are easy
to implement. Thus, we show in Chap. 21, we can learn how to do subgraph level
training which will allow us to potentially build very useful models of cognition
We find this approach intellectually liberating as we are completely agnostic as
to the particular details of the nodal and edge processing functions. Indeed, once
a particular level of module complexity has been chosen, it is a straightforward
task to use that model as a base point for a different model. Also, it is possible
to outline longitudinal studies which allow us to model brain function from low
resolution to high keeping the same intellectual framework at all resolutions. We
also are explicitly asynchronous in our modeling approach and we discuss how this
impacts our computational strategies. Many of these questions will be addressed in
the next volume.
We will begin our discussions with a few models from our own teaching and
research to illustrate some general points about the type of modeling we are interested
in. Hence, we include high level modeling challenges such as understanding painting
and music, the spread of altruistic genes in a population and modeling depression as
grist for our mill. We are always striving for ways to obtain insight about complicated
questions. Hence, the quest here is to develop insight into how cognition might arise
and also how cognitive dysfunction might occur and how such dysfunction might be
ameliorated by external agents.
1.4 Theoretical Modeling Issues 11

1.4 Theoretical Modeling Issues

The details of how proteins interact, how genes interact and how neural modules
interact to generate the high level outputs we find both interesting and useful are
known to some degree but it is quite difficult to use this detailed information to
build models of high level function. Let’s explore several models before we look at
a cognitive model. In all of these models, we are asking high level questions and
wondering how we might create a model that gives us some insight. We all build
models of the world around us whether we use mathematical, psychological, political,
biological and so forth tools to do this. All such models have built in assumptions
and we must train ourselves to think abut these carefully. We must question the
abstractions of the messiness of reality that led to the model and be prepared to
adjust the modeling process if the world as experienced is different from what the
model leads them to expect. There are three primary sources of error when we build
models:
• The error we make when we abstract from reality; we make choices about which
things we are measuring are important. We make further choices about how these
things relate to one another. Perhaps we model this with mathematics, diagrams,
words etc.; whatever we choose to do, there is error we make. This is called Model
error.
• The error we make when we use computational tools to solve our abstract models.
This error arises because we typically must replace the model we came up with in
the first step with an approximate model that can be implemented on a computer.
This is called Truncation error.
• The last error is the one we make because we can not store numbers exactly in any
computer system. Hence, there is always a loss of accuracy because of this. This
is called Round Off error.
All three of these errors are always present and so the question is how do we know
the solutions our models suggest relate to the real world? We must take the modeling
results and go back to original data to make sure the model has relevance. These
models are not too hard but capture the tension between the need to understand a
very high level question using what we know at the first principle level.

Transcription Models: In a model of model of protein transcription, we start with


a simple model that lumps all kinds of things together and we end with a much
more detailed one that is based on an equilibrium analysis point of view. We
make many assumptions in order to build these models and by the end, we can
see how to introduce positive and negative feedback and delays into the process.
We end there with the statement that all of this complexity would then have to be
folded into models of gene interactions called regulatory gene networks. Our high
level questions such as “What gene clusters control a phenotype?” are difficult to
answer using this level of detail and hence, useful abstractions are needed. Note
the model error that inevitably is introduced.
12 1 BioInformation Processing

The spread of a gene throughout a population: We can ask the question of how
does a given phenotype become dominant in a population? To do this, we try to
formulate the problem in a very abstract way for a very simple population of two
phenotypes with simple assumptions about fertility etc. This simple model, nev-
ertheless, has good explanatory power and we can use it to understand better how
domesticated wheat became dominant and wild wheat became rare. Answering
that high level question is difficult without some sort of abstract model to develop
an equation based model we can use to see how the split between two phenotypes
alters over time. The really interesting thing about this model is that all of it is
developed using only algebra.
The Spread of Altruism in a Population: To develop a model for the spread of
altruistic behavior in a population, we assume this behavior can be explained as
a phenotype due to some combination of genes. We can build on the Viability
model to some extent, but to do this properly, we have to find a way to model
actions made by individuals when there is a cost to them even though there is a
benefit to the good of the population. This is a very high level idea and all the
detail about protein transcription, regulatory gene networks and so forth is really
irrelevant. We need a better way to find insight. Here, we carefully go over the
insight into the spread of altruism called Hamilton’s Rule which was formulated
by Hamilton in the early 1960’s. Hamilton’s Rule was based on a parameter called
the kinship coefficient and it is quite difficult to understand what is might mean.
We give several different ways of looking at it all of which shed insight in their
own way.
A Simple Insulin Model: In diabetes there is too much sugar in the blood and the
urine. This is a metabolic disease and if a person has it, they are not able to use
up all the sugars, starches and various carbohydrates because they don’t have
enough insulin. Diabetes can be diagnosed by a glucose tolerance test (GTT). If
you are given this test, you do an overnight fast and then you are given a large dose
of sugar in a form that appears in the bloodstream. This sugar is called glucose.
Measurements are made over about five hours or so of the concentration of glucose
in the blood. These measurements are then used in the diagnosis of diabetes.
It has always been difficult to interpret these results as a means of diagnosing
whether a person has diabetes or not. Hence, different physicians interpreting the
same data can come up with a different diagnosis, which is a pretty unacceptable
state of affairs! The idea of this model, which we introduce in Peterson (2015b),
for diagnosing diabetes from the GTT is to find a simple dynamical model of
the complicated blood glucose regulatory system in which the values of two
parameters would give a nice criterion for distinguishing normal individuals from
those with mild diabetes or those who are pre diabetic. Of course, we have to
choose what these parameters are and that is the art of the modeling process!
A Cancer Model: A simple model of cancer is based on a tumor suppressor gene
(TSG) that occurs in two alleles. There are two pathways to cancer: one is due to
point mutations that knock out first one allele and then another of the TSG. The
1.4 Theoretical Modeling Issues 13

other pathway starts with cells having both alleles of the TSG but a chromosome
that is damaged in some way (copy errors in cell division etc.). The question is
which pathway to cancer is dominant? This is hard to answer, of course. With
our model, we can generate a prediction that there is an algebraic relationship
between the chromosomal damage rate and the population of cells in the cancer
model. It is quite messy to do this, but imagine how hard it would be to find this
relationship using computer simulations. Knowing the relationship equation is
very helpful and gives us a lot of insight.
The Classical Predator Prey Model: In the classical Predator—Prey model, we use
abstraction to develop the nonlinear system of differential equations that repre-
sents the interactions between food fish and predator fish. All the food fish are
lumped together and all the different types of predators are also thrown into one
bin. Nevertheless, this model predicts periodic solutions whose average food fish
and predator values explain very nicely puzzling data from World War I. How-
ever, the model does not take into account self interaction between food fish and
predator fish. When that is added, we can still explain the data, but the trajectories
for the food and predator fish converge over time to a fixed value. This hardly
represents the behavior we see in the sea, so we have lost some of our insight
and the model is much less explanatory even though it is more accurate because
self interaction has been added. A general principle seems to be peeking out here:
adding self interaction which is essentially adding damping rules out periodicity.
So in a biological model which has periodicity in it, there should be a push–pull
mechanism where the damping can become excitation. Another way to look at it
is that we need positive and negative feedback in the model. This is not a proof,
of course, just a reasonable hypothesis.
A West Nile Virus Infection Model: A West Nile Virus infection does something
very puzzling. Normally, as the amount of virus in an animal host increases, the
animal’s probability of survival drops. So if you took 10 mice and exposed them
to a level of virus, you could measure how many mice survived easily. You just
look at the mice to see if they are still alive. If you plot host survival versus
virus infection level, for most viral infections, you see a nice reversed S-curve.
At low levels, you have 10 surviving and as the viral load increases, that number
smoothly decays to 0 with no upticks in between. West Nile Virus infection is
quite different: there is some sort of immunopathology going on that allows the
number of surviving mice to oscillate up and down. Hence, even at high viral load,
you can have more mice surviving than you expect. The question here is simple:
explain the survival curve data. This is very hard. We build a model but to do
it, we have to come up with abstract approximations of T-Cell interactions, the
spread of various chemical messengers throughout the system and so forth. But
most importantly, what constitutes host death in our simulation? Host death is a
very high level idea and to compare our simulation results to the real survival data
requires that our measure of host death is reasonable. So this model is quite messy
and filled with abstractions and at the end of the day, it is still hard to decide what
a simulated host death should be. This model is a lot like a brain model where our
question is “What is depression?”
14 1 BioInformation Processing

1.5 Code

All of the code we use in this book is available for download from the site Biological
Information Processing (http://www.ces.clemson.edu/~petersj/CognitiveModels.
html). The code samples can then be downloaded as the zipped tar ball Cogni-
tiveCode.tar.gz and unpacked where you wish. If you have access to MatLab, just
add this folder with its sub folders to your MatLab path. If you don’t have such access,
download and install Octave on your laptop. Now Octave is more of a command
line tool, so the process of adding paths is a bit more tedious. When we start up an
Octave session, we use the following trick. We write up our paths in a file we call
MyPath.m. For us, this code looks like this
Listing 1.1: How to add paths to Octave
f u n c t i o n MyPath ( )
%
s 1 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / : ’ ;
s 2 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /GSO : ’ ;
s 3 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /HH: ’ ;
s 4 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / I n t e g r a t i o n : ’ ;
s 5 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / I n t e r p o l a t i o n : ’ ;
s 6 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / L i n e a r A l g e b r a : ’ ;
s 7 = ’ / h ome/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / N e r n s t : ’ ;
s 8 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /ODE : ’ ;
s 9 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / RootsOpt : ’ ;
s 1 0 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / L e t t e r s : ’ ;
s 1 1 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / Graphs : ’ ;
s 1 2 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /PDE : ’ ;
s 1 3 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /FDPDE : ’ ;
s 1 4 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /3DCode ’ ;
s = [ s1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 , s9 , s 1 2 ] ;
addpath ( s ) ;
end

The paths we want to add are setup as strings, here called s1 etc., and to use this, we
start up Octave like so. We copy MyPath.m into our working directory and then do
this.
Listing 1.2: Set paths in octave
o c t a v e>> MyPath ( ) ;

We agree it is not as nice as working in MatLab, but it is free! You still have
to think a bit about how to do the paths. For example, in this text, we develop
two different ways to handle graphs in MatLab. The first is in the directory
GraphsGlobal and the second is in the directory Graphs. They are not
to be used together. So if we wanted to use the setup of Graphs and noth-
ing else, we would edit the MyPath.m file to set s = [s11]; only. If we
wanted to use the GraphsGlobal code, we would edit MyPath.m so that
s11 =’/home/petersj/MatLabFiles/BioInfo/GraphsGlobal:’;
and then set s = [s11];. Note the directories in the MyPath.m are ours: the main
directory is ’/home/petersj/MatLabFiles/BioInfo/ and of course,
you will have to edit this file to put your directory information in there instead of ours.
1.5 Code 15

All the code will work fine with Octave. So pull up a chair, grab a cup of coffee
or tea and let’s get started.

References

T. Higuchi, M. Iwata, W. Liu, (eds.), in Evolvable Systems: From Biology to Hardware, Proceedings
of the First International Conference, October 1996. Lecture Notes in Computer Science, Vol.
1259 (Tsukuba, Japan, 1997)
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015a in press)
J. Peterson, Calculus for Cognitive Scientists: Higher Order Models and Their Analysis, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015b in press)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015c in press)
J. Peterson, BioInformation Processing: A Primer On Computational Cognitive Science, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015d in press)
M. Sipper, Evolution of Parallel Cellular Machines: The Cellular Programming Approach, Number
1194 in Lecture Notes in Computer Science (Springer, Berlin, 1997)
E.Sanchez, Towards Evolvable Hardware: The Evolutionary Engineering Approach. in E. Sanchez,
M. Tomassini, (eds.), Proceedings of the First International Workshop, Lausanne, Switzerland,
October 1995, Vol. 1259 (Springer, Berlin, 1996)
A. Toynbee, J. Caplan, A Study of History: A New Edition (Barnes and Noble Books, New York,
1995). Revised and abridged, Based on the Oxford University Press Edition 1972
B. Tuchman, Practicing History: Selected Essays (Ballantine Books, New York, 1982)
Part II
Diffusion Models
Chapter 2
The Diffusion Equation

We are now ready to study the time dependent solutions of the cable equation which
are not separable. We assume you have read the derivation of the cable equation
presented in Peterson (2015) so that you are comfortable with that material. Also,
we have already studied how to solve the cable equation using the separation of
variables technique. This method solved the cable equation with boundary condi-
tions as a product v̂m (λ, τ ) = u(λ) w(τ ). Then, in order to handle applied voltage
pulses
 or current pulses, we had to look at an infinite series solution of the form
n A n u n (λ) wn (τ ). Now, we will see how we can find a solution which is not time
and space separable. This requires a fair bit of work: first, a careful discussion of
how the random walk model in a limiting sense gives a probability distribution which
serves as a solution to the classical diffusion model. This is done in this chapter. We
follow the standard presentation of this material in for example Johnston and Wu
(1995) and others.
The solution of the diffusion model is very important because we can use a clever
integral transform technique to turn solve a general diffusion model. So in Chap. 3
we discuss the basics of the Laplace and Fourier Transform. Finally, in Chap. 4, we
use a change of variables to transform a cable model into a diffusion equation. Then,
we use integral transforms to solve the diffusion model. We will get the same solution
we obtained using the random walk model which gives us a lot of insight into the
structure of the solution.
Now these ideas are important because many second messenger systems are based
on the diffusion of a signal across the cytoplasm of the excitable cell. So knowing
how to work with diffusion based triggers is necessary.

2.1 The Microscopic Space–Time Evolution of a Particle

The simplest probabilistic model that links the Brownian motion of particles to the
macroscopic laws of diffusion is the 1D random walk model. In this model, we
assume a particle moves every τm seconds along the y axis a distance of either λc
© Springer Science+Business Media Singapore 2016 19
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_2
20 2 The Diffusion Equation

Fig. 2.1 Walking in a volume box

or −λc with probability 21 . Consider the thought experiment shown in Fig. 2.1 where
we see a volume element which has length 2λc and cross sectional area A. Since
we want to do microscopic analysis of the space time evolution of the particles, we
assume that λc < y.
Let φ+ (s, y) denote the flux density of particles crossing from left to right across
the plane located at position y at time s; similarly, let φ− (s, y) be the flux density
of particles crossing from right to left. Further, c(s, y) denote the concentration of
particles at coordinates (s, y). What is the net number of particles that cross the plane
of area A?
Since the particles randomly change their position every τ seconds by ±λc , we
can calculate flux as follows: first, recall that flux here is the number of particles
per unit area and time; i.e., the units are par ticles
sec−cm2
. Since the walk is random, half of
the particles will move to the right and half to the left. Since the distance moved is λc ,
half the concentration at c(y − λ2c , s) will move to the right and half the concentration
at c(y + λ2c , s) will move to the left. Now the number of particles crossing the plane
is concentration times the volume. Hence, the flux terms are
 
1 Aλc c y − λ2c , s
φ+ (s, y) =
2 Aτm
 
λc λc
= c y − ,s
2τm 2
 
λc
1 Aλ c c y + 2
, s
φ− (s, y) =
2 Aτm
 
λc λc
= c y + ,s
2τm 2
2.1 The Microscopic Space–Time Evolution of a Particle 21

The net flux, φ(s, y) is thus

φ(s, y) = φ+ (s, y) − φ− (s, y)


    
λc λc λc
= c s, y − − c s, y +
2τm 2 2

Since, λyc is very small in microscopic analysis, we can approximate the concentration
c using a first order Taylor series expansion in two variables if we assume that the
concentration is sufficiently smooth. Our knowledge of the concentration functions
we see in the laboratory and other physical situations implies that it is very reasonable
to make such a smoothness assumption. Hence, for small perturbations s + a and
y + b from the base point (s, y), we find

∂c ∂c
c(s + a, y + b) = c(s, y) + (s, y) a + (s, y) b + e(s, y, a, b)
∂s ∂y

where e(s, y, a, b) is an error term which is proportional to the size of the largest of
|a| and |b|. Thus, e goes to zero as (a, b) goes to zero in a certain way. In particular,
for a = 0 and b = ± λ2c , we obtain
   
λc ∂c λc λc
c s, y − = c(s, y) − (s, y) + e s, y, 0, −
2 ∂y 2 2
   
λc ∂c λc λc
c s, y + = c(s, y) + (s, y) + e s, y, 0,
2 ∂y 2 2

and we note that the error terms are proportional to λ2c . Thus,
    
λc ∂c λc λc
φ(s, y) = − (s, y) λc + e s, y, 0, − − e s, y, 0,
2τm ∂ y 2 2
λ2c ∂c
≈− (s, y)
2τm ∂ y

as the difference in error terms is still proportional to λ2c at worst and since λc is very
small compared to y, the error term is negligible.
Recall from Peterson (2015) that Ficke’s law of diffusion for particles across a
membrane (think of our plane at y as the membrane) can be written as

∂ [b]
Jdi f f = −D
∂x
22 2 The Diffusion Equation

Equating c with [b] and Jdi f f with φ, we see that the diffusion constant D in Ficke’s
Law of diffusion can be interpreted in this context as

λ2c
D=
2τm

This will give us a powerful connection between the macroscopic diffusion coefficient
of Ficke’s Law with the microscopic quantities that define a random walk as we will
see in the next section.

2.2 The Random Walk and the Binomial Distribution

In this section, we will be following the discussion presented in Weiss (1996), but
paraphrasing and simplifying it for our purposes. More details can be gleaned from a
careful study of that volume. Let’s assume that a particle is executing a random walk
starting at position x = 0 and time t = 0. This means that from that starting point,
the particle can move either +λc or −λc at each tick of a clock with time measured
in time constant units τm . We can draw this as a tree as shown in Fig. 2.2.
In this figure, the labels shown in each node refer to three things: the time, in
units of the time constant τm (hence, t = 3 denotes a time of 3τm ); spatial position
in units of the space constant λc (thus, x = −2 means a position of −2λc units); and
the number of possible paths that read that node (therefore, Paths equal to 6 means
there are six possible paths that can be taken to arrive at that terminal node.

(0, 0)

(1, −1, 1) (1, 1, 1)

(2, −2, 1) (2, 0, 2) (2, 2, 1)

(3, −3, 1) (3, −1, 3) (3, 1, 3) (3, 3, 1)

(4, −4, 1) (4, −2, 4) (4, 0, 6) (4, 2, 4) (4, 4, 1)

A particle starts at t = 0, x = 0 and can move with a given proba-


bility left or right a distance λc in time increments of τm . The triple
(a, b, c) indicates the time is t = aτm , the x location is x = bλc and the
number of paths from the root of the tree to the node is P = c.

Fig. 2.2 The random walk of a particle


2.2 The Random Walk and the Binomial Distribution 23

Since time and space are discretized into units of time and space constants, we
have a physical system where time and space are measured as integers. Thus, we can
ask what is the probability, W (m, n), that the particle will be at position m at time n?
In the time interval of n units, let’s define some auxiliary variables: n + is the number
of time steps where the particle moves to the right—i.e., the movement is +1; and
n − is the number of time steps the particle moves to the left, a movement of −1. We
clearly see that M is really the net displacement and

n = n+ + n−, m = n+ − n−

Solving, we see

n+m n−m
n+ = , n− =
2 2
Let the usual binomial coefficients be denoted by Bn, j where
 
n n!
Bn, j = =
j j! (n − j)!

Now look closely at Fig. 2.2. If you look at a node, you can count how many right
hand moves are made in any given path to that node. For example, the node for time
4 and space position 2 can be reached by 4 different paths and each of them contains
3 + λc moves. Note three +λc moves corresponds to n + = 3 and the number 4 is the
same as the binomial coefficient Bn=4,n + =3 = 4. Hence, at a given node, all of the
paths that can be taken to that node have the same n + value as shown in Table 2.1.
The triangle in Fig. 2.2 has the characteristic form of Pascal’s triangle: the root node
is followed by two nodes which split into three nodes and so on. The pattern is
typically written in terms of levels. At level zero, there is just the root node. This is
time 0 for us. At level one or time 1, there are two nodes written as 1 − 1. At level
two or time 2, there are three nodes written as 1 − 2 − 1 to succinctly capture the
branching we are seeing. Continuing, we see the node pattern can be written

Level or Time Paths


0 1
1 1-1
2 1-2-1
3 1-3-3-1
4 1-6-4-6-1

Each of the nodes at a given time then will have some paths leading to it and on each of
these paths, there will be the same number of right hand moves. So counting right hand
moves, the paths denoted by 1 − 1 correspond to n + = 0 for the left node and n + = 1
for the right node. The number of paths for these are the same as Bn=1,n + =0 = 1 and
24 2 The Diffusion Equation

Table 2.1 Comparing paths and rightward movements


Time Paths n+ Binomial coefficient
Bn,n +
1 1-1 0-1 B1,0 - B1,1
2 1-2-1 0-1-2 B2,0 - B2,1 - B2,2
3 1-3-3-1 0-1-2-3 B3,0 - B3,1 - B3,2 -
B3,3
4 1-6-4-6-1 0-1-2-3-4 B4,0 - B4,1 - B4,2 -
B4,3 - B4,4

Bn=1,n + =1 = 1. The next level, 1 − 2 − 1 corresponds to n + = 0 for the left node,


n + = 1 for the middle one and n + = 2 for the right one. The number of paths for
these nodes are then Bn=2,n + =0 = 1 and Bn=2,n + =1 = 2 and Bn=2,n + =2 = 1. We can
do this for each level giving us the information shown in Table 2.1.

2.3 Rightward Movement Has Probability 0.5 or Less

Let the probability that the particle moves to the right be p and the probability it
moves to the left be q. Then we have p + q = 1. Let’s assume that p ≤ 0.5. This
forces q ≥ 0.5. Then, in n time units, the probability a given path of length n is taken
is ρ(n) where
+ − + +
ρ(n) = p n q n = p n q n−n

and the probability that a path will terminate at position m, W (m, n), is just the
number of paths that reach that position in n time units multiplied by ρ(n). We know
that for a given number of time steps, certain position will never be reached. Note
that if the fraction 0.5(n + m) is not an integer, then there will be no paths that
reach that position m. Let n f be the results of the computation n f = 0.5(n + m). We
know n f need not be an integer. Define the extended binomial coefficients Cn,n f by

Bn,n f if n f is an integer
Cn,n f =
0 else

Then, we see
W (m, n) = Cn,n f p n f q n−n f

From our discussions above, it is clear for any position m that is reached in n time
units, that this can be rewritten in terms of n + as
+ +
(m) (m)
W (m, n) = Bn,n + (m) p n q n−n
2.3 Rightward Movement Has Probability 0.5 or Less 25

where n + (m) is the value of n + associated with paths terminating on position m. If


you think about this a bit, you’ll see that for even times n, only even positions m are
reached; similarly, for odd times, only odd positions are reached.

2.3.1 Finding the Average of the Particles Distribution


in Space and Time

The expectation E(m) is defined by


n
E(m) = mW (m, n)
m=−n

where of course, many of the individual terms W (m, n) are actually zero for a given
time n because the positions are never reached. From this, we can infer that


n
+ +
E(m) = m Bn,n + p n q n−n
n + =0

To compute this, first, switch to a simple notation. Let n + = j. Then, since n + = m+n
2
,
m = 2 j − n and so


n
E(m) = m Bn, j p j q n− j
j=0

n
= (2 j − n) Bn, j p j q n− j
j=0

= 2 E( j) n S

where

n
E( j) = j Bn, j p j q n− j
j=0

n
S= Bn, j p j q n− j
j=0

Since p + q = 1, we know that


n
( p + q) = S =
n
Bn, j p j q n− j = 1
j=0
26 2 The Diffusion Equation

Further, by taking derivatives with respect to p, we see that

d
p ( p + q)n = p n ( p + q)n−1 = p n
dp

Thus,

n
E( j) = j Bn, j p j q n− j
j=0

n
= j p Bn, j p j−1 q n− j
j=0

n
d
= Bn, j p j q n− j
p
j=0
dp
⎛ ⎞
d ⎝
n
=p Bn, j p j q n− j ⎠
dp j=0
d
=p ( p + q)n
dp
= pn

by our calculations above. We conclude that

E(m) = 2 E( j) nS
= 2 pn − n

2.3.2 Finding the Standard Deviation of the Particles


Distribution in Space and Time

We compute the standard deviation of our particle’s movement through space and
time in a similar way. First, we find the second moment of m,


n
E(m 2 ) = m 2 W (m, n)
m=−n
2.3 Rightward Movement Has Probability 0.5 or Less 27

Our earlier discussions still apply and we find we can rewrite this as


n
E(m 2 ) = (2 j − n)2 Bn, j p j q n− j
j=0

n
= (4 j 2 − 4 jn + n 2 ) Bn, j p j q n− j
j=0

= 4E( j 2 ) − 4 n E( j) + n 2 S
where E( j 2 ) is the second moment of the binomial distribution. We know S is 1 and
E( j) is pn. So we only have to compute E( j 2 ). Note that

d2
p2 ( p + q)n = p 2 n (n − 1) ( p + q)n−2 = p 2 n (n − 1)
d2 p

Also
d2
p 2 n (n − 1) = p 2 ( p + q)n
d2 p
⎛ ⎞
d 2  n
= p2 2 ⎝ Bn, j p j q n− j ⎠
d p j=0

n
d2
= p2 Bn, j p j q n− j
j=0
d2 p

n
= p 2 j ( j − 1) Bn, j p j−2 q n− j
j=0

n
= ( j 2 − j) Bn, j p j q n− j
j=0

= E( j 2 ) − E( j)

We conclude that
E( j 2 ) = p 2 n (n − 1) + p n

Since, we also have a formula for E(m 2 ), we see

E(m 2 ) = 4E( j 2 ) − 4 n E( j) + n 2
28 2 The Diffusion Equation

Now recall the standard formula from Statistics: the square of the standard deviation
of our distribution is σ 2 = E(m 2 ) − (E(m))2 . Hence,

σ 2 = E(m 2 ) − (E(m))2
= 4E( j 2 ) − 4 n E( j) + n 2 − (2E( j) − n)2
= 4(E( j 2 ) − (E( j))2 )
 
= 4 p n (n − 1) + pn − ( pn)
2 2

= 4 np(1 − p) = 4 npq.

Hence, the standard deviation is



σ= 4 pqn.

2.3.3 Specializing to an Equal Probability Left and Right


Random Walk

Here, p and q are both 0.5. We see that 4 pq = 1 and

E(m) = 2 p n − n = 0

σ= n

Note, that if the random walk is skewed, with say p = 0.1, then we would obtain

E(m) = 2 p n − n = −0.8n

σ = .6 n

so that for large n, the standard deviation of our particle’s movement would be
approximately 0.6n rather than n.

2.4 Macroscopic Scale

For a very large number of steps, the probability distribution, W (m, n), will approach
a limiting form. This is done by using an approximation to k! that is know as the
Stirling Approximation. It is known that for very large k,
 k
√ k
k! ≈ 2πk .
e
2.4 Macroscopic Scale 29

The distribution of our particle’s position throughout space and time can be written as
n+m n−m
W (m, n) = Bn, n+m
2
p 2 q 2

using our definitions of n + and n − (it is understood that W (m, n) is zero for non
integer values of these fractions). We can apply Stirling’s approximation to Bn, n+m
2
:

√  n n
n! ≈ 2πn
e
     n+m
n+m n+m n+m 2
! ≈ 2π
2 2 2e
  n  n+m
2
 m  n+m
2
≈ π(n + m) 1+
2e n
     n−m
n−m n−m n−m 2
! ≈ 2π
2 2 2e
  n  n−m
2
 m  n−m
2
≈ π(n − m) 1−
2e n
From this, we find
  n
   
n+m n−m m 2  n n m2 2  m  −m
2
 m  m2
! !≈πn 1− 2 1− 2 1− 1+
2 2 n 2e n n n

Hence, we see

√  − 1  − n
2πn  n n   n −n m2 2
m2 2 
m  m2  m − m2
Bn, n+m ≈ ) 1− 2 1− 2 1− 1+
2 πn e 2e n n n n
  − 1  − n
2 
2 n m2 2
m2 m  m2  m − m2
≈ 2 1− 2 1− 2 1− 1+
πn n n n n

Thus,
  1  2  1

m2

−n

m2

ln Bn, n+m ≈ ln + n ln(2) + − ln 1 − + ln 1 −
2
2 πn 2 n2 2 n2
m  m  −m  m 
+ ln 1 − + ln 1 +
2 n 2 n
30 2 The Diffusion Equation

Now, for small x, the standard Taylor’s series approximation gives ln(1 + x) ≈ x;
hence, for mn sufficiently small, we can say

   
1 2 1 m2 n m2 m m m m
ln Bn, n+m ≈ ln + n ln(2) + + − −
2
2 πn 2 n 2 2 n 2 2 n 2 n
  2 2
1 2 1 m m
≈ ln + n ln(2) + −
2 πn 2 n2 2n

For very large n (i.e. after a very large number of time steps nτm ), since we assume
2
m
m
is very small, the term mn 2 is negligible. Hence dropping that term and exponenti-
ating, we find   
2 −m 2
Bn, n+m ≈ exp 2n
2
nπ 2n

This implies that


  
2 −m 2 n+m n−m
W (m, n) ≈ exp 2n p 2 q 2
nπ 2n
     m2
2 −m 2 n p
≈ exp (4 pq) 2
nπ 2n q

Note that if the particle moves with equal probability 0.5 to the right or the left at
any time tick, this reduces to

2 −m2
W (m, n) ≈ e 2n

and for p = 1
3
and q = 23 , this becomes
     n2   m2
2 −m 2 8 1
W (m, n) ≈ exp
nπ 2n 9 2

2.5 Obtaining the Probability Density Function

From our discrete approximations in previous sections, we can now derive the prob-
ability density function, P(x, t) at position x at time t. In what follows, we will
assume that p ≤ 0.5. Let x be a small number which is approximately mλc for
some m. The probability that the particle is in an interval [x − x
2
, x + x
2
] can then
be approximated by 
P(x, t) x ≈ W (k, n)
k
2.5 Obtaining the Probability Density Function 31

where the sum is over all indices k such that the position kλc lies in the interval
[x − x
2
, x + x
2
]. Hence,
 
x x
x− ,x + ≡ {(m − j)λc , ..., mλc , ..., (m + j)λc }
2 2

for some integer j. Now from the way the particle moves in a random walk, only
half of these tick marks will actually be positions the particle can occupy. Hence,
half of the probabilities W (m − i, n) for i from − j to j are zero. The number of
x
nonzero probabilities is thus ≈ 2λ c
. We can therefore approximate the sum by taking
the middle term W (m, n) and multiplying by the number of nonzero probabilities.

x
P(x, t) x ≈ W (m, n)
2λc

which implies, since x = mλc and t = nτm , that for very large n,

1
P(x, t) = W (m, n)
2λc
⎛ ⎞
  2λx
1 ⎝ −x 2

t p c
=  exp λ2c
(4 pq) m

λ2 4 2τm t q
4π 2τcm t

λ2c
Note that the term 2τm
is the diffusion constant D. Thus,

    2λx
1 −x 2 t p c
P(x, t) = √ exp (4 pq) 2τm

4π D t 4D t q

Next, rewrite all the power terms as exponentials:


       
1 −x 2 t p x
P(x, t) = √ exp exp ln(4 pq) exp ln
4π D t 4D t 2τm q 2λc

2.5.1 p Less Than 0.5

Note since p + q = 1, 4 pq is between bigger than zero and strictly less than 1 for
all nonzero p and q with p not 0.5. So for us, if we let ξ be the value 4 1pq , we know
that ln(ξ) > 0. Let A = ln(ξ). Next, let ζ be the value qp . Since p is less than 1/2
here, this means ζ > 1. Let B = ln(ζ). Then, we know A and B are both positive
32 2 The Diffusion Equation

in this case unless the probability of movement left and right is equal. In the case of
equal probability, ξ = 1, ζ = 1 and A = B = 0. However, with unequal probability,
we have  
t −t −t ln(ξ)
(4 pq) 2τm = (ξ) 2τm = exp .
2τm

Also,
  2λx  
p c −x −x ln(ζ)
= (ζ) 2λc = exp
q 2λc

Rewriting the density function, we obtain


 
1 −x 2 t x
P(x, t) = √ exp − ln(ξ) − ln(ζ)
4π D t 4D t 2τm 2λc

We thus have
 
1 −x 2 t x
P(x, t) = √ exp −A −B
4π D t 4D t 2τm 2λc

After manipulation, we can complete the square on the quadratic term and rewrite it as
 2
−x 2 t x 1 BD B2 − 4 A
−A −B =− x+ t + t
4D t 2τm 2λc 4Dt λc 8τm

and thus
  2   
1 1 BD B2 − 2 A
P(x, t) = √ exp − x+ t exp t
4π D t 4Dt λc 4τm

The case where p is larger than 1/2 is handled in a symmetric manner. We simply
reverse the role of p and q in the argument above.

2.5.2 p and q Are Equal

The equal probability random walk has A = B = 0 and so the probability density
function reduces to
 
1 x2
P(x, t) = √ exp −
4π D t 4Dt
2.6 Understanding the Probability Distribution of the Particle 33

2.6 Understanding the Probability Distribution


of the Particle

It is important to get a strong intuitive feel for the probability distribution of the parti-
cle under the random walk and skewed random walk protocols. A normal distribution
has the form
1 − x
2

P(x) = √ e 2E(m2 )
2πσ

Hence, comparing, we see the standard deviation of our particle can be interpreted as
t
σ = λC2 .
τm

Note, in general, for fixed time, we can plot the particle’s position. We show this in
Fig. 2.3 for three different standard deviations, σ. In the plot, the standard deviation
is labeled as D. Now if we skew the distribution so that the probability of moving to
the right is now 16 , we find
  2   
1 1 1.61Dt 0.35t
P(x, t) = √ exp − x− exp
4π D t 4Dt λc τm

which generates the plot shown in Fig. 2.4.

Fig. 2.3 Normal


distribution: spread depends
on standard deviation
34 2 The Diffusion Equation

Fig. 2.4 Skewed random


walk probability distribution:
p is 0.1666

2.7 The General Diffusion Equation

We will now show that the probability density function P(x, t) given by Eq. 2.1

1 x2
P(x, t) = √ e− 4Dt (2.1)
4π D t

solves the diffusion equation


∂ ∂2
=D 2.
∂t ∂x
where D is the diffusion constant. A more general diffusion equation would not
assume the diffusion constant D is independent of position x. The diffusion equation
in that case would be  
∂u ∂ ∂u
= D
∂t ∂x ∂x

However, often the term D is independent of the variable x allowing us to write the
simpler form
∂u ∂2u
=D 2
∂t ∂x

in the time and space variables (t, x). In this model, we assume x is one dimensional,
although it is easy enough to extend to higher dimensions. In 3 dimensions, u has
units of mM per liter or cm3 , but in a one dimensional setting u has units of mM per
cm. However, the diffusion coefficient D will always have units of length squared
2
per time unit; i.e. cms . Note in our earlier discussions, we found P(x, t) had to have
units of particles per cm or simple cm−1 . We also assume the diffusion coefficient
2.7 The General Diffusion Equation 35

D is positive. We can now show a typical solution for positive time is given by our
P(x, t) and let’s do it by direct calculation. With a bit of manipulation, we find
 
∂P 1 −3/2 −x 2 /(4Dt) 1 x2
=√ t e − + .
∂t 4π D 2 4Dt

Next,
 
∂P 1 2x
e−x /(4Dt) −
2
= √
∂x 4π Dt 4Dt
  
∂ P
2
1 −x 2 /(4Dt) 2
= √ e −
∂x 2 4π Dt 4Dt
   
2x 2x −x 2 /(4Dt)
+ − − e
4Dt 4Dt
 
1 2 2x 2
e−x /(4Dt)
2
= √ −1 +
4π Dt 4Dt 4Dt
 
1 1 1 x2
t −3/2 e−x /(4Dt) − +
2
= √
D 4π D 2 4Dt
1 ∂P
= .
D ∂t

Hence, we see P solves ∂∂tP = D ∂∂xP2 . From our previous discussions, we see we can
2

interpret the motion of our particle as that of a random walk for space constant λc
and time constant τm as the number of time and position steps gets very large.
Note the behavior at t = 0 seems undefined. However, we can motivate the inter-
pretation of the limiting behavior as t → 0 as an impulse injection of current as
follows. Note using the substitution, y = 1t , for x not zero,
 
√ −x 2 y
lim u(t, x) = lim y exp
t→0 y→∞ 4D

and so at x = 0, we find u(t, 0) = √1t . This behaves qualitatively like an impulse.


Thus, we interpret this solution as an impulse injection of essentially unbounded
magnitude at t = 0. This corresponds to the usual delta function input δ(0). If the
amount of current injected is I δ(0), the solution is
 
I −x 2
u(t, x) = √ exp .
4π Dt 4Dt

Since for x = 0, lim x→∞ u(t, x) = 0 and limt→0 u(t, x) = 0, we know u(t, x) has a
maximum value at some value of t. The generic shape of the solution can be seen in
Fig. 2.5. The solution at x = 0 behaves like the curve √1t as is shown. The solutions
for non zero values of x behave like spatially spread and damped pulses. In fact, the
36 2 The Diffusion Equation

Fig. 2.5 The generic


diffusion solution behavior

solution to the diffusion equation for nonzero x provides a nice model of the typical
excitatory pulse we see in a dendritic tree. Also, if I < 0, this is a good model of an
inhibitory pulse as well. Letting
 
1 x2
I (t) = √ exp −
4π Dt 4Dt

we find for nonzero t, that


  
1 −3 x2 1 x2
I (t) = √ t 2 exp − − +
4π D 4Dt 2 4Dt
2
For t = 0, f (t) = 0 implies − 21 + 4Dt
x
= 0. Thus, the maximum of the injection
x2
pulse occurs at t = 2D . The maximum value of the current is then
 
 2  2 2D
− 4D
x
x 1 x2
I =  e
2D 4π D (x 2 /(2D)
1
e− 2 .
1
= √
2π |x|

it follows that the current pulses become very sharply defined with large heights as x
approaches 0 and for large x, the pulse has a low height and is spread out significantly.
We show the qualitative nature of these pulses in Fig. 2.6. In both curves shown, the
x2
maximum is achieved at the time point 2D . These general solutions can also be
centered at the point (t0 , x0 ) giving the solution
2.7 The General Diffusion Equation 37

Fig. 2.6 The generic


diffusion solution maximum

 
I −(x − x0 )2
u(t, x) = √ exp .
4π D(t − t0 ) 4D(t − t0 )

which is not defined at t0 itself.

References

D. Johnston, S. Miao-Sin Wu, Foundations of Cellular Neurophysiology (MIT Press, Cambridge,


1995)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015 in press)
T. Weiss, Transport, Cellular Biophysics, vol. 1 (MIT Press, Cambridge, 1996)
Chapter 3
Integral Transforms

We now discuss the basics of two important tools in the analysis of models:
the Laplace Transform and the Fourier Transform. We will use these tools to solve
the diffusion model which we obtain from the cable model by using a change of
variable technique.

3.1 The Laplace Transform

The Laplace Transform acts on the time domain of a problem as follows: given the
function x defined on s ≥ 0, we define the Laplace Transform of x to be
 ∞
L (x) = x(s) e−βs ds
0

The new function L (x) is defined for some domain of the new variable β.
The variable β’s domain is called the frequency domain and in general, the values
of β where the transform is defined depend on the function x we are transforming.
Also, in order for the Laplace transform of the function x to work, x must not grow to
fast—roughly, x must decay like an exponential function with a negative coefficient.
The solutions we seek to our cable equation are expected on physical grounds to
decay to zero exponentially as we let t (and therefore, also s!) go to infinity and as
we let the space variable z (and hence w) go to ±∞. Hence, the function  we seek
will have a well-defined Laplace transform with respect to the s variable.
Now, what about the Laplace transform of a derivative? Consider
   ∞
dx d x −βs
L = e ds
ds 0 ds

© Springer Science+Business Media Singapore 2016 39


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_3
40 3 Integral Transforms

Integrating by parts, we find


  ∞

d x −βs 
−βs ∞
e ds = x(s) e 0
+ β x(s) e−βs ds
0 ds 0
= lim (x(R) e−β R ) − x(0) + β L (x)
R→∞

Now if the function x grows slower than some e−c R for some constant c, the limit
will be zero and we obtain
 
dx
L = β L (x) − x(0)
ds

We often use the symbol fˆ to denote the Laplace Transform of f .


For a simple example, consider the Laplace Transform of the function

e−as s ≥ 0
f (s) =
0 s < 0

where a is positive. It is easy to show that for β > a,

fˆ = L ( f )
 ∞
= e−as eβs ds
0
1
= .
β + a

Hence, the inverse Laplace Transform, denoted by L −1 ( fˆ), here is


 
1
L −1 = e−as .
β + a

3.1.1 Homework

Exercise 3.1.1 Find the Laplace transform of f (t) = et .


Exercise 3.1.2 Find the Laplace transform of f (t) = e−2t .
Exercise 3.1.3 Find the Laplace transform of f (t) = e−6t .
Exercise 3.1.4 Find the inverse Laplace transform of fˆ(s) = 2
s+4
.

Exercise 3.1.5 Find the inverse Laplace transform of fˆ(s) = 3


s−4
.

Exercise 3.1.6 Find the inverse Laplace transform of fˆ(s) = 6


2s+5
.
3.2 The Fourier Transform 41

3.2 The Fourier Transform

Given a function g defined on the y axis, we define the Fourier Transform of g to be


 ∞
1
F (g) = √ g(y) e− jξ y dy
2π −∞

where j denotes the square root of minus 1 and the exponential term is defined to
mean

e− jξ y = cos(ξ y) − jsin(ξ y)

This integral is well defined if g is what is called square integrable which roughly
means we can get a finite value for the integral of g 2 over the y axis. Note that we
can compute the Fourier Transform of the derivative of g as follows:
   ∞
dg 1 dg − jξ y
F = √ e dy
dy 2π −∞ dy

Integrating by parts, we find


  ∞
1 ∞
dg − jξ y 
− jξ y ∞ 1
√ e dy = g(y) e −∞
+ jξ √ g(y) e− jξ y dy
2π −∞ dy 2π −∞
   
− jξ R − jξ R
= lim g(R) e − lim g(R) e + jξ F (g)
R→∞ R→−∞

Since we assume that the function g decays sufficiently quickly as y → ±∞, the
first term vanishes and we have
 ∞
1 dg − jξ y
√ e dy = + jξ F (g)
2π −∞ dy

Applying the same type of reasoning, we can see that


 ∞ 2   ∞
1 d g − jξ y dg − jξ y ∞ 1 dg − jξ y
√ e dy = e  + jξ √ e dy
2π −∞ dy 2 dy −∞ 2π −∞ dy
     
dg dg dg
= lim (R) e− jξ R − lim (R) e− jξ R + jξ F
R→∞ dy R→−∞ dy dy

We also assume that the function g’s derivative decays sufficiently quickly as
y → ±∞. Thus the the first term vanishes and we have
 ∞  
1 d 2 g − jξ y dg
√ e dy = + jξ F = j 2 ξ 2 F (g) = −ξ 2 F (g)
2π −∞ dy 2 dy
42 3 Integral Transforms

because j 2 = −1. Hence,


 
d2g
F = −ξ 2 F (g)
dy 2

We can then define the inverse Fourier Transform as follows. If the Fourier Transform
transform of the function g(y) is denoted by ĝ(ξ ), recall the inverse Fourier Transform
of ĝ(ξ ) is given by
 ∞
1
F −1 (ĝ) = √ ĝ(ξ ) e jξ y dξ.
2π −∞

Now, if the Fourier Transform transform of the function g(y) is denoted by ĝ(ξ ),
recall the inverse Fourier Transform of ĝ(ξ ) is given by
 ∞
1
F −1 (ĝ) = √ ĝ(ξ ) e jξ y dξ.
2π −∞

Here is a standard example of such an inversion. Consider the inverse Fourier trans-
form of
 ∞  ∞
r 0 λc I 0 1 r 0 λc I 0
e−ξ s e jξ y dξ = e−ξ s + jξ y dξ.
2 2
√ √
2π 2π −∞ 2π −∞

Now to invert the remaining part, we rewrite the exponent by completing the square:
 2  2 
ξ jy jy
−ξ s + jξ y = −s ξ − j ξ +
2 2

s 2s 2s
   2 
jy 2 jy
= −s ξ − −
2s 2s

Hence,
jy 2 y2
e−ξ s + jξ y
= e−s(ξ − 2s ) e− 4s
2

because j 2 = −1. To handle the inversion here, we note that for any positive a and
positive base points x0 and y0 :
 ∞  ∞  ∞  ∞
1
e−a(x−x0 ) e−a(y−y0 ) d xd y = e−u e−v
2 2 2 2
dudv
−∞ −∞ −∞ −∞ a
√ √
using the change of variables u = a(x − x0 ) and v = a(y − y0 ). Now we
convert to polar coordinates and it can be shown the resulting calculation gives
3.2 The Fourier Transform 43
 ∞  ∞  ∞ 
−u 2 −v 2 1 1 2π
π
e−r r dr dθ =
2
e e dudv =
−∞ −∞ a a 0 0 a

We are doing what is called a double integration here which we have never discussed.
However, our need for it never arose until this moment. For our purposes, think of
it as applying our standard Riemann integral two times in succession. For example,
if we had the double integral
 2  3
(x 2 + y 2 ) d x d y
1 2

we would interpret this as two Riemann integrals working from the inside out.
3
We would first find 2 (x 2 + y 2 ) d x by thinking of the integrand as a function
of x only where the y variable is treated as a constant. This is essentially the inverse
of partial differentiation! We would find
   3
3
x3 
(x + y ) d x =
2 2
+ y x 
2
2 3 2

where we use our usual one variable Cauchy Fundamental Theorem of Calculus.
2
This gives 193
+ y 2 as the answer. Then you apply the outer Riemann integral 1 to
this result to get
    2
2
19 19 y 3  19 7
+ y dy =
2
y+ = + .
1 3 3 3 1 3 3

Our situation above is more interesting than our example, of course. We have infinite

limits, −∞ which we interpret as as usual improper integral. Naturally, since Rie-
mann integrals can be interpreted as areas under curves, many time an integration
like this over an unbounded domain ends up with an infinite answer. But the functions
we are integrating in our example strongly decay as they are e−u and e−v . So there
2 2

is hope that even in this infinite case, we will get a finite result. You should take more
mathematics classes! Our volumes do not discuss two and three dimensional Rie-
mann integration although we have discussed differentiation in the setting of more
than one variable. There is always more to learn, but for our purposes, we just need
this one calculation so we didn’t want to belabor this point with several extra chap-
ters. Now getting back to our problem, we see after converting to polar coordinates,
∞ ∞ ∞ 2π
we converted to convert the integral −∞ −∞ dudv into the integral 0 0 r dr dθ .
This used some additional multiple integration tools we are skipping: essentially, to
add up the area of the uv plane using rectangular blocks of area du dv we need an
infinite number of them stretching from −∞ to ∞ in both the u and v directions.
However, to add up the area of the u v plane using area chunks based on the polar
coordinate variables r and θ , we divide up the two dimensional space using rays
from the origin every dθ radians. Then the area pieces formed from slice the rays
44 3 Integral Transforms

between radius r and r + dr . The resulting area is like a chopped off piece of pie and
has area r dr dθ . Then to cover the whole plane we need an infinite number of dr ’s
and dθ ’s from 0 to 2π . Yeah, that’s all there is too it. Fortunately, like we said, this
background is not so important here. Our move to polar coordinates thus gives us
 ∞ 2  ∞
π π
e−a(x−x0 ) d x e−a(x−x0 ) d x =
2 2
= =⇒
−∞ a −∞ a

We can apply this result to our problem to see


 ∞  ∞
r 0 λc I 0 −ξ 2 s + jξ y r 0 λc I 0 jy 2 y2
e dξ = e−s(ξ − 2s ) e− 4s dξ
2π −∞ 2π −∞
r 0 λc I 0 π − y2 1 y2
= e 4s = r0 λc I0 √ e− 4s
2π s 4π s

3.2.1 Homework

Exercise 3.2.1 Find the inverse Fourier Transform of e−2ξ showing all of the messy
2

steps in the discussion above.

Exercise 3.2.2 Find the inverse Fourier Transform of e−1.5ξ showing all of the
2

messy steps in the discussion above.

Exercise 3.2.3 Find the inverse Fourier Transform of e−4ξ showing all of the messy
2

steps in the discussion above.


Chapter 4
The Time Dependent Cable Solution

We are now ready to solve the cable equation directly. There are lots of details
here to go through, so it will take awhile. However, as we said earlier, the solutions
we obtain here are essentially the same as the ones we found using random walk
ideas. So we now have a good understanding of what the diffusion constant means.
Note this is very important to understanding diffusion based triggers. Recall the full
cable equation

∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c ke
∂z 2 ∂t

Recall that ke is current per unit length. We are going to show you a mathematical
way to solve the above cable equation when there is an instantaneous current impulse
applied at some nonnegative time t0 and nonnegative spatial location z 0 . Essentially,
we will think of this instantaneous impulse as a Dirac delta function input as we have
discussed before: i.e. we will need to solve

∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c Ie δ(t − t0 , z − z 0 )
∂z 2 ∂t

where δ(t − t0 , z − z 0 ) is a Dirac impulse applied at the ordered pair (t0 , z 0 ). We will
simplify our reasoning by thinking of the impulse applied at (0, 0) as we can simply
translate our solution later as we did before for the idealized impulse solution to the
time independent cable equation.

© Springer Science+Business Media Singapore 2016 45


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_4
46 4 The Time Dependent Cable Solution

4.1 The Solution for a Current Impulse

We assume that the cable is infinitely long and there is a current injection at some
point z 0 on the cable which instantaneously delivers I0 amps of current. As usual,
we will model this instantaneous delivery of current using a family of pulses. For
convenience of exposition, we will consider the point of application of the pulses to
be z 0 = 0.

4.1.1 Modeling the Current Pulses

Consider the two parameter family of pulses defined on the rectangle [− τnm , τnm ] ×
[− λmc , λmc ] by
2 2λ2
I0 nm − n2 t2τ2 −τ
m − 2 2c 2
Pnm (t, z) = e 2
m e m z −λc
τm λ c γ 2

and defined to be 0 off the rectangle. This family is well-known to be a C ∞ family


of functions with compact support (the support of each function is the rectangle).
amp
At this point, γ is a constant to be determined. Note that Pnm is measured in cm−s .
amp
We know the currents ke have units of cm ; hence, we model the family of current
impulses kenm by

kenm (t, z) = τm Pnm (t, z)

This gives us the proper units for the current impulses. This is then a specific example
of the type of pulses we have used in the past. There are several differences:
• Here we give a specific functional form for our pulses which we did not do before.
It is straightforward to show that these pulses are zero off the (t, z) rectangle and
are infinitely differentiable for all time and space points. Most importantly, this
means the pulses are very smooth at the boundary points t = ± τnm and z = ± λmc .
Note we are indeed allowing these pulses to be active for a small interval of time
before zero. When we solve the actual cable problem, we will only be using the
positive time portion.
• The current delivered by this pulse is obtained by the following integration:
 ∞  ∞
J = Pnm (t, z)dtdz
−∞ −∞
 λc  τm
m n
= Pnm (t, z)dtdz
− λmc − τnm
4.1 The Solution for a Current Impulse 47

Since we will be interested only in positive time, we will want to evaluate


 λc  τm
J m n
= Pnm (t, z)dtdz
2 − λmc 0

The constant γ will be chosen so that these integrals give the constant value of
J = 2I0 for all n and m. If we integrate over the positive time half of the pulse,
we will then get the constant I0 instead.
amps
• The units of our pulse are cm−s . The time integration then gives us amp
cm
and the
following spatial integration gives us amperes.

Let’s see how we should choose γ: consider


 λc  τm
m n
J= Pnm (t, z)dtdz
− λmc − τnm

Make the substitutions β1 = nt


τm
and β2 = mz
λc
. We then obtain with a bit of algebra
  1
τm λ c 1 I0 nm − β22−1 − β22−1
J = e 1 e 2 dβ1 dβ2
nm −1 −1 τm λc γ 2
 1  1 
I0 − 22 − 22
= 2 e β1 −1
dβ1 e β1 −1
dβ2
γ −1 −1
 1 2
I0 − 2
= 2 e x 2 −1 d x
γ −1

Clearly, if we choose the constant γ to be


 1
1 − x 22−1
γ= √ e dx
2 −1

then all full pulses Pnm deliver 2I0 amperes of current when integrated over space
and time and all pulses with only nonnegative time deliver I0 amperes.

4.1.2 Scaling the Cable Equation

We can convert the full cable equation

∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c ke ,
∂z 2 ∂t
48 4 The Time Dependent Cable Solution

where ke is current per unit length into a diffusion equation with a change of vari-
ables. First, we introduce a dimensionless scaling to make it easier via the change of
variables: y = λzc and s = τtm . With these changes, space will be measured in units of
space constants and time in units of time constants. We then define the a new voltage
variable w by

w(s, y) = vm (τm t, λc z)

It is then easy to show using the chain rule that

∂2w 2 ∂ vm
2
= λ
∂ y2 c
∂z 2
∂w ∂vm
= τm
∂s ∂t
giving us the scaled cable equation

∂2w ∂w
=w+ − ro λ2c ke (τm s, λc y)
∂ y2 ∂s

Now to further simplify our work, let’s make the additional change of variables

(s, y) = w(s, y) es

Then
 
∂ ∂w
= + w es
∂s ∂s
∂2 ∂2w s
= e
∂ y2 ∂ y2

leading to

∂ 2  −s ∂ −s
e = e − r0 λ2c ke (τm s, λc y)
∂ y2 ∂s

After rearranging, we have the version of the transformed cable equation we need
to solve:

∂2 ∂
= − r0 λ2c τm ke (τm s, λc y) es (4.1)
∂y 2 ∂s

Recall that ke is current per unit length. We are going to show you a physical way to
find a solution to the above diffusion equation when there is an instantaneous current
impulse of strength I0 applied at some nonnegative time t0 and nonnegative spatial
4.1 The Solution for a Current Impulse 49

location z 0 . Essentially, we will think of this instantaneous impulse as a Dirac delta


function input as we have discussed before: i.e., we will need to solve

∂2 ∂
= − ro λ2c I0 δ(s − s0 , y − y0 ) es
∂y 2 ∂s

In particular, for our family of pulses, Pnm , we have

I0 nm n2 τ 22τsm22 −τ 2 m2 λ22λy2c2 −λ2


Pnm (τm s, λc y) = e m m e c c
τm λ c γ 2
I0 nm 2 2
= e n2 s2 −1 e m2 y2 −1
τm λ c γ 2

Then, we find

∂2 ∂
= − r0 λ2c τm Pnm (τm s, λc y) es . (4.2)
∂ y2 ∂s

4.1.3 Applying the Laplace Transform in Time

We will apply some rather sophisticated mathematical tools to solve the Eq. (4.2).
These include the Laplace and Fourier Transforms which were introduced in Chap. 3.
Hence, applying the Laplace transform to both sides of Eq. 4.2, we have
   
∂2 ∂
L = L − r0 λ2c L (Pnm (τm s, λc y) es )
∂ y2 ∂s
∂ 2 L ()
= β L () − (0, y) − r0 λ2c τm L (Pnm (τm s, λc y) es )
∂ y2

This is just the transform of the time portion of the equation. The space portion
has been left alone. We will now further assume that

(0, y) = 0, y = 0

This is the same as assuming

vm (0, z) = 0, z = 0

which is a reasonable physical initial condition. This gives us

∂ 2 L ()
= β L () − r0 λ2c τm L (Pnm (τm s, λc y) es )
∂ y2
50 4 The Time Dependent Cable Solution

4.1.4 Applying the Fourier Transform in Space

We now apply the Fourier Transform in space to the equation we obtained after
applying the Laplace Transform in time. We have
 
∂ 2 L ()
F = β F (L ()) − r0 λ2c τm F (L (Pnm (τm s, λc y) es ))
∂ y2
or

−ξ 2 F (L ()) = β F (L ()) − r0 λ2c τm F (L (Pnm (τm s, λc y) es ))

For convenience, let

T () = F (L ())

Then, we see we have

−ξ 2 T () = β T () − r0 λ2c τm T (Pnm (τm s, λc y) es )

4.1.5 The T Transform of the Pulse

We must now compute the T transform of the pulse term Pnm (τm s, λc y) es . Since
the pulse is zero off the (t, z) rectangle [− τnm , τnm ] × [− λmc , λmc ], we see the (s, y)
integration rectangle reduces to [− n1 , n1 ] × [− m1 , m1 ]. Hence,

 1  1
1 m n
T (Pnm (τm s, λc y) es ) = √ Pnm (τm s, λc y) es e−βs e− jξ y dsdy
2π − m1 0
 1  1
1 m n I0 nm − 2 22 − 2
= √ e n s −1 e m 2 y 2 −1 es e−βs e− jξ y dsdy
2π − m1 0 τ m λc γ 2

Now use the change of variables ζ = ns and u = my to obtain


 1  1
1 I0 nm − ζ 22−1 − 22 ζ βζ jξu dζ du
T (Pnm (τm s, λc y) es ) = √ e e u −1 e n e− n e− m
2π −1 0 τm λc γ 2 n m
 1  1
1 1 I0 − 2
− 2 ζ βζ jξu
= √ e ζ 2 −1 e u 2 −1 e n e− n e− m dζ du
2π 2 −1 −1 τm λc γ 2
4.1 The Solution for a Current Impulse 51

Note that this implies that


 1  1  
1 I0 2 2 ζ βζ jξu
lim T (Pnm (τm s, λc y) es ) = √ e ζ 2 −1 e u 2 −1 dζ du lim e n e− n e− m
n,m→∞ 8π −1 −1 τ m λc γ 2 n,m→∞
 1  1
1 I0 2 2
= √ e ζ 2 −1 e u 2 −1 dζ du
8π −1 −1 τ m λc γ 2
I0
= √ 2γ 2
8πτm λc γ 2
I0
= √
2πτm λc

4.1.6 The Idealized Impulse T Transform Solution

For a given impulse Pnm , we have the T transform solution satisfies

−ξ 2 T () = β T () − r0 λ2c τm T (Pnm (τm s, λc y) es )

and so the idealized solution we seek is obtained by letting n and m go to infinity to


obtain
I0
−ξ 2 T () = β T () − r0 λ2c τm √
2πτm λc
r 0 λc I 0
= β T () − r0 λc I0 =⇒ (ξ 2 + β)T () = √

Thus, we find

r 0 λc I 0
∗ = T () = √
2π(β + ξ 2 )

where for convenience, we denote the T transform of  by ∗ .

4.1.7 Inverting the T Transform Solution

To move back from the transform (β, ξ) space to our original (s, y) space, we apply
the inverse of our T transform. For a function h(β, ξ), this is defined by
 ∞  ∞  
−1 1 βs −1 −1
T (h) = √ h(β, ξ) e e jξ y
dβdξ = F L (h)
2π −∞ 0
52 4 The Time Dependent Cable Solution

To find the solution to our cable equation, we will apply this inverse transform to .
First, we can compute the inner inverse Laplace transform to obtain
 
−1 ∗ r0 λc I0 −1 1
= r0 λc I0 e−ξ s .
2
L ( ) = √ L
2π β + ξ 2

Recall the inverse Fourier Transform of ĝ(ξ) is given by


 ∞
1
F −1 (ĝ) = √ ĝ(ξ) e jξ y dξ.
2π −∞

Hence, we need to calculate


 ∞  ∞
r 0 λc I 0 −ξ 2 s jξ y r 0 λc I 0
e−ξ s+ jξ y dξ.
2
e e dξ =
2π −∞ 2π −∞

This inversion has been done in Chap. 3. Hence, we find


 ∞  ∞
r 0 λc I 0 r 0 λc I 0 jy 2 y2
e−ξ e−s(ξ − 2s ) e− 4s dξ
2
s+ jξ y
dξ =
2π −∞ 2π −∞

r 0 λc I 0 π − y 2 1 y2
= e 4s = r0 λc I0 √ e− 4s
2π s 4πs

Hence, our idealized solution is

1 y2
(s, y) = r0 λc I0 √ e− 4s
4πs

λ2c
Note, if we think of the diffusion constant as D0 = τm
, we can rewrite (s, y)
as follows:
2
1 − x
(s, y) = r0 λc I0  e 4(λ2c /τm )t
4π(λ2c /τm )(t/λ2c )
1 − x
2
1 − x
2
= r 0 λc I 0  e 4D0 t = r0 λ2c I0 √ e 4D0 t
4π D0 (t/λ2c ) 4π D0 t

It is exciting to see the term

1 − x
2
P0 (x, t) = √ e 4D0 t (4.3)
4π D0 t

which
√ is the usual probability density function for a random walk with space constant
λc / 2 and time constant τm ! Thus,
4.1 The Solution for a Current Impulse 53
 
(s, y) = (r0 λc )I0 λc P0 (x, t)

The term λc P0 (x, t) is the probability we are in an interval of width λc /2 around


x and so is a scalar without units. The term r0 λc I0 has units ohms amps or volts. We
can then find the full solution w since

w(s, y) = (s, y) e−s


1 y2
= r 0 λc I 0 √ e− 4s e−s
4πs

We can write this in the unscaled form at pulse center (t0 , z 0 ) as

1 ((z−z 0 )/λc )2

vm (t, z) = r0 λc I0 √ e 4((t−t0 )/τm ) e−(t−t0 )/τm ) (4.4)
4π ((t − t0 )/τm )

4.1.8 A Few Computed Results

We can use the scaled solutions to generate a few surface plots. In Fig. 4.1 we see
a pulse applied at time zero and spatial location 1.0 of magnitude 4. We can use
the linear superposition principle to sum two applied pulses: in Fig. 4.2, we see the

Fig. 4.1 One time dependent pulse


54 4 The Time Dependent Cable Solution

Fig. 4.2 Two time dependent pulses

effects of a pulse applied at space position 1.0 of magnitude 4.0 add to a pulse of
strength 10.0 applied at position 7.0. Finally, we can take the results shown in Fig. 4.2
and simply plot the voltage at position 10.0 for the first three seconds. This is shown
in Fig. 4.3.

4.1.9 Reinterpretation in Terms of Charge

Note that our family of impulses are


 2 2λ2

I0 nm n2 t2τ2 −τ
m c
kenm (t, z) = τm Pnm (t, z) = τm e 2
m e m 2 z 2 −λ2
c
τm λ c γ 2
(τm I0 )nm n2 t2τ2 −τ
2
m 2λ2c
(Q 0 )nm n2 t2τ2 −τ 2
m 2λ2
c
= e 2
m e m2 z2 −λc2 = e 2 2 2 2
m e m z −λc
τm λ c γ 2 τm λ c γ 2

where Q 0 is the amount of charge deposited in one time constant. The rest of the
analysis is quite similar. Thus, we can also write our solutions as

r 0 λc Q 0 1 −
((z−z 0 )/λc )2
vm (t, z) = √ e 4((t−t0 )/τm ) e−(t−t0 )/τm (4.5)
τm 4π ((t − t0 )/τm )
4.2 The Solution to a Constant Current 55

Fig. 4.3 Summed voltage at 3 time and 10 space constants

4.2 The Solution to a Constant Current

Now we need to attack a harder problem: we will apply a constant external current
i e which is defined by

Ie , t >0
i e (t) =
0, t ≤0

Recall that we could rewrite this as i e = Ie u(t) for the standard unit impulse function
u. Now in a time interval h, charge Q e = Ie h is delivered to the cable through the
external membrane. Fix the positive time t. Now divide the time interval [0, t] into
K equal parts using h equals Kt . This gives us a set of K + 1 points {ti }

t
ti = i , 0≤i ≤K
K
where we note that t0 is 0 and t K is t. This is called a partition of the interval [0, t]
which we denote by the symbol P K . Here h is the fraction Kt . Now let’s think of
the charge deposited into the outer membrane at ti as being the full amount Ie h
deposited between ti and ti+1 . Then the time dependent solution due to the injection
of this charge at ti is given by
56 4 The Time Dependent Cable Solution

r 0 λc I e h 1 − (z/λc )
2

vmi (t, z) = √ e 4((t−ti )/τm ) e−((t−ti )/τm )


τm 4π ((t − ti )/τm )

and since our problem is linear, by the superposition principle, we find the solution
due to charge Ie h injected at each point ti is given by

r 0 λc I e h 
K
1 − (z/λc )
2

vm (t, z, P K ) = √ e 4((t−ti )/τm ) e−((t−ti )/τm )


τm j=0
4π ((t − ti )/τm )

r 0 λc I e 
K
1 − (z/λc )
2

= √ e 4((t−ti )/τm ) e−((t−ti )/τm ) (ti+1 − ti )


τm j=0
4π ((t − ti )/τm )

Now we can reorder the partition P K using u K −i = t − ti which gives u K = t and


u 0 = 0; hence, we are just moving backwards through the partition. Note that

ti+1 − ti = u K −i − u K −i−1 = h

This relabeling allows us to rewrite the solution as

r 0 λc I e 
K
1 − (z/λc )
2

vm (t, z, P K ) = √ e 4(ui /τm ) e−(u i /τm ) (u i+1 − u i )


τm j=0
4π (u i /τm )

We can do this for any choice of partition P K . Since all of the functions involved
here are continuous, we see that as K → ∞, we obtain the Riemann Integral of the
idealized solution for an impulse of size Ie applied at the point u

r 0 λc I e 1 (z/λc )2
vmI (u, z) = √ e− 4(u/τm ) e−(u/τm )
τm 4π (u/τm )

leading to
 
t t
r 0 λc I e 1 (z/λc )2
vm (t, z) = vmI (u, z) du = √ e− 4(u/τm ) e−(u/τm ) du
0 0 τm 4π (u/τm )

We can rewrite this in terms of the probability density function associated with
this model. Using Eq. 4.3, we have
 
t t
r0 λ2c Ie
vm (t, z) = vmI (u, z) du = P0 (z, u) e−(u/τm ) du
0 0 τm

which is a much more convenient form. Much more can be said about this way of
looking at the solutions, but that leads us into much more advanced mathematics!
4.3 Time Dependent Solutions 57

4.3 Time Dependent Solutions

We now know the solution to the time dependent infinite cable model for an idealized
applied charge pulse has the form

ro λC Q e 1 ( λzC )2 − τt
vm (z, t) = exp − e M . (4.6)
τM 4π( τtM ) 4( τtM )

If we apply a current Ie for all t greater than zero, it is also possible to apply the
superposition principle for linear partial differential equations and write the solu-
tion in terms of a standard mathematical function, the error function, er f (x) =
x −y 2
√2 e dy. We will not go through these details as we feel you have been abused
π 0
enough at this point. However, the work to verify these comments is not that different
than what we have already done, so if you have followed us so far, you can go read
up on this and be confident you can follow the arguments. Note that this situation is
the equivalent of the steady state solution for the infinite cable with a current impulse
applied at z = 0 which can be shown to be
⎡ ⎛ ⎞ ⎤

ro λC Ie ⎣ t | λzC |
er f ⎝ ⎠ − 1⎦ e| λC |
z
vm (z, t) = +
4 τM 2 τM t

⎡ ⎛ ⎞ ⎤

ro λC Ie ⎣ t | λzC |
er f ⎝ ⎠ + 1⎦ e−| λC |
z
+ − (4.7)
4 τM 2 t τM

Although this solution is much more complicated in appearance, note that as t → ∞,


we obtain the usual steady state solution given by equation given in Peterson (2015).
We can use this solution to see how quickly the voltage due to current applied to the
fiber decays. We want to know when the voltage vm decays to one half of its starting
value. This is difficult to do analytically, but if you plot the voltage vm vs. the t for
various cable lengths , you can read off the time at which the voltage crosses the
one half value. This leads to the important empirical result that gives a relationship
that tells us the position on the fiber z at which the voltage vm has dropped to one
half of its starting value. This relationship is linear and satisfies
 
λC λC
z=2 t1 − . (4.8)
τM 2 2

The slope of this line can then be interpreted as a velocity, the rate at which the fiber
position for half-life changes; this is a conduction velocity and is given by
 
λC
v 21 = 2 , (4.9)
τM
58 4 The Time Dependent Cable Solution

having units of (cm/s). Using our standard assumption that ri ro and our equations
for λC and gm in terms of membrane parameters, we find

2G M 1
v 21 = a2, (4.10)
ρi C M
2

indicating that the induced voltage attenuates proportional to the square root of the
fiber radius a. Hence, the double the ratio of the fundamental space to time constant
is an important heuristic measure of the propagation speed of the voltage pulse. Good
to know.

Reference

J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd,
Singapore, 2015 in press)
Part III
Neural Systems
Chapter 5
Mammalian Neural Structure

In this chapter, we will discuss the basic principles of the neural organization that
subserves cognitive function. A great way to get a handle on this material is in the
neuroanatomy books (Pinel and Edwards 2008) and (Diamond et al. 1985) which
we have spent many hours working with using colored pencils and many sheets of
paper. We encourage you to do this as our simple introduction below is just to get
you started.

5.1 The Basic Model

Our basic neural model is based on abstractions from neurobiology. The two halves
or hemispheres of the brain are connected by the corpus callosum which is like a
cap of tissue that sits on top of the brain stem. The structures in this area are very
old in an evolutionary sense. The outer surface of the brain is the cortex which is a
thin layer organized into columns. There is too much cortex to fit comfortably inside
the human skull, so as the human species evolved and the amount of cortical tissue
expanded, the cortex began to develop folds. Imagine a deep canyon in the earth’s
surface. The walls of the canyon are called a gyrus and are the cortical tissue. The
canyon itself called a sulcus or fissure. There are many such sulci with corresponding
gyri. Some of the gyri are deep enough to touch the corpus callosum. One such gyrus
is the longitudinal cerebral gyrus fissure which contains the cingulate gyri at the very
bottom of the fissure touching the corpus callosum.
Consider the simplified model of information processing in the brain that is pre-
sented in Fig. 5.3. This has been abstracted out of much more detail (Nolte 2002).
In Brodal (1992) and Diamond et al. (1985), we can trace out the details of the con-
nections between cortical areas and deeper brain structures near the brain stem and
construct a very simplified version which will be suitable for our modeling purposes.
Raw visual input is sent to area 17 of the occipital cortex where is is further processed
by the occipital association areas 18 and 19. Raw auditory input is sent to area 41
of the Parietal cortex. Processing continues in areas 5, 7 (the parietal association
© Springer Science+Business Media Singapore 2016 61
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_5
62 5 Mammalian Neural Structure

areas) and 42. There are also long association nerve fiber bundles starting in the cin-
gulate gyrus which connect the temporal, parietal and occipital cortex together. The
Temporal—Occipital connections are labeled C2, the superior longitudinal fascicu-
lus; and B2, the inferior occipitofrontal nerve bundles, respectively. The C2 pathway
also connects to the cerebellum for motor output. The Frontal—Temporal connec-
tions labeled A, B and C are the cingular, uncinate fasciculus and arcuate fasciculus
bundles and connect the Frontal association areas 20, 21, 22 and 37.
Area 37 is an area where inputs from multiple sensor modalities are fused into
higher level constructs. The top boundary of area 17 in the occipital cortex is marked
by a fold in the surface of the brain called the lunate sulcus. This sulcus occurs
much higher in a primate such as a chimpanzee. Effectively, human like brains have
been reorganized so that the percentage of cortex allotted to vision has been reduced.
Comparative studies show that the human area 17 is 121 % smaller than it should be
if its size was proportionate to other primates. The lost portion of area 17 has been
reallocated to area 7 of the parietal cortex. There are special areas in each cortex
that are devoted to secondary processing of primary sensory information and which
are not connected directly to output pathways. These areas are called associative
cortex and there are primarily defined by function, not a special cell structure. In the
parietal cortex, the association areas are 5 and 7; in the temporal cortex, areas 20,
21, 22 and 37; and in the frontal, areas 6 and 8. Hence, human brains have evolved
to increase the amount of associative cortex available for what can be considered
symbolic processing needs. Finally, the same nerve bundle A connects the Parietal
and Temporal cortex. In addition to these general pathways, specific connections
between the cortex association areas are shown as bidirectional arrows. The box
labeled cingular gyrus is essentially a simplified model of the processing that is
done in the limbic system. Note the bidirectional arrows connecting the cingulate
gyrus to the septal nuclei inside the subcallosal gyrus, the anterior nuclei inside the
thalamus and the amygdala. There is also a two way connection between the anterior
nuclei and the mamillary body inside the hypothalamus. Finally, the cingulate gyrus
connects to the cerebellum for motor output. We show the main cortical areas of
the brain in Fig. 5.1. Our ability to process symbolic information is probably due to
changes in the human brain that have occurred over evolutionary time. A plausible
outline of the reorganizations in the brain that have occurred in human evolution has
been presented in Holloway (1999, p. 96), which we have paraphrased somewhat in
Table 5.1.
In the table presented, we use some possible unfamiliar terms which are the
abbreviation mya, this refers to millions of years ago; the term petalias, which are
asymmetrical projections of the occipital and frontal cortex; and endocast, which is
a cast of the inside of a hominid fossil’s skull. An endocast must be carefully planned
and examined, of course, as there are many opportunities for wrong interpretations.
Holloway (1999, p. 77), believes that there are no new evolutionarily derived struc-
tures in the human brain as compared to other animals—nuclear masses and the fiber
systems interconnecting them are the same. The differences are in the quantitative
relationships between and among these nuclei and fiber tracts, and the organization
of cerebral cortex structurally, functionally and in integration.
5.1 The Basic Model 63

Fig. 5.1 The main cortical subdivisions of the brain

Table 5.1 Brain reorganizations


Brain changes (Reorganization) Hominid fossil Time (mya) Evidence
Reduction of Area 17 A. afarensis 3.5–3.0 Brain endocast
Increase in posterior parietal cortex Homo habilis 2.0–1.8 Brain endocast
reorganization of frontal cortex
third inferior frontal convolution
Broca’s area
Cerebral asymmetries left occipital, right Homo habilis 2.0–1.8 Brain endocast
frontal petalias
Refinements in cortical organization to a Homo erectus 1.5–0.0 Brain endocast
modern human pattern

We make these prefatory remarks to motivate why we think a reasonable model


of cognition (and hence, cognitive dysfunction) needs a working model of cortical
processing. Our special symbolic processing abilities appear to be closely linked to
the reorganizations of our cortex that expanded our use of associative cortex. Indeed,
in Holloway (1999, p. 97), we find a statement of one of the guiding principles in
our model building efforts.
our humanness resides mostly in our brain, endowed with symbolic abilities, which have
permitted the human animal extraordinary degrees of control over its adaptive powers in
both the social and material realms.

As might be expected, clear evidence of our use of symbolic processing does not
occur until humans began to leave archaeological artifacts which were well enough
preserved. Currently, such evidence first appears in painted shells used for adornment
found in African sites from approximately seventy thousand years ago. Previous to
those discoveries, the strongest evidence for abstract and symbolic abilities came
64 5 Mammalian Neural Structure

in the numerous examples of Paleolithic art from the Europe and Russia in the
form of cave paintings and carved bone about twenty-five thousand years ago (25
kya). Paleolithic art comprises thousands of images made on bone, antler, ivory and
limestone cave walls using a variety of techniques, styles and artistic conventions.
In Conkey (1999, p. 291), it is asserted that
If,...,we are to understand the ‘manifestations and evolution of complex human behavior’,
there is no doubt that the study of paleolithic ‘art’ has much to offer.

In the past, researchers assumed that the Paleolithic art samples that appeared so
suddenly 25 kya were evidence of the emergence of a newly reorganized human brain
that was now capable of symbolic processing of the kind needed to create art. Conkey
argues persuasively that this is not likely. Indeed, the table of brain reorganization
presented earlier, we noted that the increase in associative parietal cortex in area 7
occurred approximately 3 mya. Hence, the capability of symbolic reasoning probably
steadily evolved even though the concrete evidence of cave art and so forth does not
occur until really quite recently. However, our point is that the creation of ‘art’ is
intimately tied up with the symbolic processing capabilities that must underlies any
model of cognition. Further, we assert that the abstractions inherent in mathematics
and optimization are additional examples of symbolic processing.
In addition, we indicate in simplified form, three major neurotransmitter pathways.
In the brain stem, we focus on the groups of neurons called the raphe and the locus
coeruleus as well as dopamine producing cells which are labeled by their location,
the substantia nigra in the brain stem. These cell groups produce neurotransmitters
of specific types and send their outputs to a collection of neural tissues that surround
the thalamus called the basal ganglia. The basal ganglia are not shown in Fig. 5.3
but you should imagine them as another box surrounding the thalamus. The basal
ganglia then sends outputs to portions of the cerebral cortex; the cerebral cortex in
turn sends connections back to the basal ganglia. These connections are not shown
explicitly; instead, for simplicity of presentation, we use the thalamus to cingulate
gyrus to associative connections that are shown. The raphe nuclei produce serotonin,
the locus coeruleus produce norepinephrine and the substantia nigra (and other cells
in the brain stem) produce dopamine. There are many other neurotransmitters, but
the model we are presenting here is a deliberate choice to focus on a few basic
neurotransmitter pathways.
The limbic processing presented in Fig. 5.3 is shown in more detail in Fig. 5.2.
Cross sections of the brain in a plane perpendicular to a line through the eyes and back
of head shown that the cingulate gyrus sits on top of the corpus callosum. Underneath
the corpus callosum is a sheath of connecting nerve tissue known as the fornix which is
instrumental in communication between these layers and the structures that lie below.
The arrows in Fig. 5.2 indicate structural connections only and should not be used
to infer information transfer. A typical model of biological information processing
that can be abstracted from what we know about brain function is thus shown in
the simplified brain model of Fig. 5.3. This shows a chain of neural modules which
subserve cognition. It is primarily meant to illustrate the hierarchical complexity of
5.1 The Basic Model 65

Fig. 5.2 The major limbic


system structures with
arrows indicating structural
connections

the brain structures we need to become familiar with. There is always cross-talk and
feedback between modules which is not shown.
Some of the principle components of the information processing system of the
brain are given in Table 5.2. The central nervous system (CNS) is divided roughly into
two parts; the brain and the spinal cord. The brain can be subdivided into a number of
discernible modules. For our purposes, we will consider the brain model to consist
of the cerebrum, the cerebellum and the brain stem. Finer subdivisions are then
shown in Table 5.2 where some structures are labeled with a corresponding number
for later reference in other figures. The numbers we use here bear no relation to the
numbering scheme that is used for the cortical subdivisions shown in Figs. 5.1 and
5.3. The numbering scheme there has been set historically. The numbering scheme
shown in Table 5.2 will help us locate brain structures deep in the brain that can
66 5 Mammalian Neural Structure

Fig. 5.3 A simplified path of information processing in the brain. Arrows indicate information
processing pathways
5.1 The Basic Model 67

Table 5.2 Information processing components


Brain → Cerebrum Cerebellum 1 Brain Stem
Cerebrum → Cerebral Hemisphere Diencephalon
Brain Stem → Medulla 4 Pons 3 Midbrain 2
Cerebral Hemisphere → Amygdala 6 Hippocampus 5
Cerebral Cortex Basal Ganglia
Diencephalon → Hypothalamus 8 Thalamus 7
Cerebral Cortex → Limbic 13 Temporal 12 Occipital 11
Parietal 10 Frontal 9
Basal Ganglia → Lenticular Nucleus 15 Caudate Nucleus 14
Lenticular Nucleus → Global Pallidus 16 Putamen 15

Fig. 5.4 Brain structures: the numeric labels correspond to structures listed in Table 5.2

only be seen by taking slices. These numbers thus correspond to the brain structures
shown in Fig. 5.4 (modules that can be seen on the surface) and the brain slices of
Figs. 5.5a, b and 5.6a, b. A useful model of the processing necessary to combine
disparate sensory information into higher level concepts is clearly built on models
of cortical processing.
There is much evidence that cortex is initially uniform prior to exposure to envi-
ronmental signal and hence a good model of such generic cortical tissue, isocortex,
is needed. A model of isocortex is motivated by recent models of cortical process-
ing outlined in Raizada and Grossberg (2003). This article uses clues from visual
processing to gain insight into how virgin cortical tissue (isocortex) is wired to allow
for its shaping via environmental input. Clues and theoretical models for auditory
cortex can then be found in the survey paper of Merzenich (2001). For our purposes,
we will use the terms auditory cortex for area 41 of the Parietal cortex and visual
cortex for area 17 of the occipital cortex. These are the areas which receive primary
sensory input with further processing occurring first in cortex where the input is
received and then in the temporal cortex in areas 20, 21, 22 and 37 as shown in
Fig. 5.3. The first layer of auditory cortex is bathed in an environment where sound
68 5 Mammalian Neural Structure

Fig. 5.5 Brain slice 1 details. a Slice 1 orientation. b Neural slice 1 cartoon

Fig. 5.6 Brain slice 2 details. a Slice 2 orientation. b Neural slice 2 cartoon

is chunked or batched into pieces of 200 ms length which is the approximate size of
the phonemes of a person’s native language. Hence, the first layer of cortex develops
circuitry specialized to this time constant.
The second layer of cortex then naturally develops a chunk size focus that is
substantially larger, perhaps on the order of 1000–10,000 ms. Merzenich details how
errors in the imprinting of these cortical layers can lead to cognitive impairments such
as dyslexia. As processing is further removed from the auditory cortex via mylenated
pathways, additional meta level concepts (tied to even longer time constants) are
developed. One way to do this is through what is called Hebbian learning. This
training method strengthens the strength of a connection between a pre and post
neuron on the basis of coincident activity. It has many variations, of course, but that
is the gist of it and it is based on the neurobiology of long term learning.
While it is clear that a form of such Hebbian learning is used to set up these circuits,
the pitfalls of such learning are discussed clearly in the literature (McClelland 2001).
For example, the inability of adult speakers of Japanese to distinguish the sound of
5.1 The Basic Model 69

an ell and an r is indicative of a bias of Hebbian learning that makes it difficult to


learn new strategies by unlearning previous paradigms. For this reason, we will not
use strict Hebbian learning protocols; instead, we will model auditory and visual
cortex with three layers using techniques based on information flow through graphs
in conjunction with versions of Hebbian learning. Our third layer of cortex is then
an abstraction of the additional anatomical layers of cortex as well as appropriate
myelinated pathways which conduct upper layer processing results to other cognitive
modules.

5.2 Brain Structure

The connections from the limbic core to the cortex are not visible from the outside.
The outer layer of the cortex is deeply folded and wraps around an inner core that
consists of the limbic lobe and corpus callosum. Neural pathways always connect
these structures. In Fig. 5.5a, we can see the inner structures known as the amygdala
and putamen in a brain cross-section. Figure 5.5a show the orientation of the brain
cross-section while Fig. 5.5b displays a cartoon of the slice itself indicating various
structures.
The thalamus is located in a portion of the brain which can be seen using the
cross-section indicated by Fig. 5.6a. The structures the slice contains are shown in
Fig. 5.6b.

5.3 The Brain Stem

Cortical columns interact with other parts of the brain in sophisticated ways. The
patterns of cortical activity (modeled by the Folded Feedback Pathways (FFP) and
On-Center Off-surround (OCOS) circuits as discussed in Sect. 5.4.2) are modulated
by neurotransmitter inputs that originate in the reticular formation or RF of the
brain. The outer cortex is wrapped around an inner core which contains, among
other structures, the midbrain. The midbrain is one of the most important information
processing modules. Consider Fig. 5.8 which shows the midbrain in cross section. A
number of integrative functions are organized at the brainstem level. These include
complex motor patterns, respiratory and cardiovascular activity and some aspects
of consciousness. The brain stem has historically been subdivided into functional
groups starting at the spinal cord and moving upward toward the middle of the brain.
The groups are, in order, the caudal and rostral medulla, the caudal, mid and
rostral pons and the caudal and rostral midbrain. We will refer to these as the
brainstem layers 1–7. In this context, the terms caudal and rostral refer to whether
a slice is closest to the spinal cord or not. Hence, the pons is divided into the rostral
pons (farthest from the spinal cord) and the caudal pons (closest to the spinal cord).
70 5 Mammalian Neural Structure

Fig. 5.7 The brainstem


layers

Fig. 5.8 The brainstem


structure

The seven brain stem layers can be seen by taking cross-sections through the
brain as indicated by repeated numbers 1–7 shown in Fig. 5.8. Each shows interesting
structure which is not visible other than in cross section. Each slice of the midbrain
also has specific functionality and shows interesting structure which is not visible
other than in cross section. The layers are shown in Fig. 5.7. Slice 1 is the caudal
medulla and is shown in Fig. 5.9a. The Medial Longitudinal Fascilus (MLF) controls
head and eye movement. The Medial Lemniscus is not shown but is right below the
MLF. Slice 2 is the rostral medulla. The Fourth Ventricle is not shown in Fig. 5.9b
but would be right at the top of the figure. The Medial Lemniscus is still not shown
but is located below the MLF like before. In slice 2, a lot of the inferior olivary
nucleus can be seen. The caudal pons is shown in Slice 3, Fig. 5.10a, and the mid
pons in Slice 4, Fig. 5.10b.
The rostral pons and the caudal and rostral midbrain are shown in Figs. 5.11a, b
and 5.12, respectively. Slice 4 and 5 contain pontine nuclei and Slice 7 contains the
5.3 The Brain Stem 71

Fig. 5.9 The medulla cross-sections. a The caudal medulla: slice 1. b The rostral medulla: slice 2

Fig. 5.10 The caudal and mid pons slices. a The caudal pons: slice 3. b The mid pons: slice 4

Fig. 5.11 The rostral pons and caudal midbrain cross-sections. a The rostral pons: slice 5. b The
caudal midbrain: slice 6
72 5 Mammalian Neural Structure

Fig. 5.12 The rostral midbrain: slice 7

Fig. 5.13 Typical neuron in the reticular formation

cerebral peduncle, a massive nerve bundle containing cortico-pontine and cortico-


spinal fibers. Fibers originating in the frontal, parietal, and temporal cortex descend
to pontine nuclei. The Pontine also connects to cerebral peduncles.
The Reticular Formation or RF is this central core of the brain stem which contains
neurons whose connectivity is characterized by huge fan-in and fan-out. Reticular
neurons therefore have extensive and complex axonal projections. Note we use the
abbreviation PAG to denote the cells known as periaqueductal gray and NG for the
Nucleus Gracilis neurons. The extensive nature of the afferent connections for a RF
neuron are shown in Fig. 5.13. The indicated neuron in Fig. 5.13 sends its axon to
many CNS areas. If one cell has projections this extensive, we can imagine the com-
plexity of reticular formation as a whole. Its neurons have ascending projections that
terminate in the thalamus, subthalamus, hypothalamus, cerebral cortex and the basal
ganglia (caudate nucleus, putamen, globus pallidus, substantia nigra). The midbrain
5.3 The Brain Stem 73

and rostral pons RF neurons thus collect sensory modalities and project this infor-
mation to intralaminar nuclei of thalamus. The intralaminar nuclei project to wide-
spread areas of cortex causes heightened arousal in response to sensory stimuli;
e.g. attention. Our cortical column models must therefore be able to accept modu-
latory inputs from the RF formation. It is also know that some RF neurons release
monoamines which are essential for maintenance of consciousness. For example,
bilateral damage to the midbrain RF and the fibers through it causes prolonged coma.
Hence, even a normal intact cerebrum can not maintain consciousness. It is clear that
input from brainstem RF is needed. The monoamine releasing RF neurons release
norepinephrine, dopamine and serotonin. The noradrenergic neurons are located in
the pons and medulla (locus coeruleus). This is Slice 5 of midbrain and it connects to
cerebral cortex. This system is inactive in sleep and most active in startling situations,
or those calling for watchful situations. Dopaminergic neurons in slice 7 of midbrain
are are located in the Substantia Nigra (SG). Figure 5.14 shows the brain areas that
are influenced by dopamine. Figure 5.15a also shows the SC (superior collicus), PAG
(periaqueductal gray) and RN (red nucleus) structures.
These neurons project in overlapping fiber tracts to other parts of the brain. The
nigrostriatal sends information to the substantia nigra and then to the caudate, puta-
men and midbrain. The medial forebrain bundle projects from the substantia nigra
to the frontal and limbic lobes. The indicated projections to the motor cortex are
consistent with initiation of movement. We know that there is disruption to cortex
function due to dopamine neurons ablation in Parkinson’s disease. The projections
to other frontal cortical areas and limbic structures imply there is a motivation and
cognition role. Hence, imbalances in these pathways will play a role in mental dys-
function. Furthermore, certain drugs cause dopamine release in limbic structures
which implies a pleasure connection. Serotonergic neurons occur in in most levels
of brain stem, but concentrate in raphe nuclei in slice 5 of midbrain—also near the
locus coeruleus (Fig. 5.15b). Their axons innervate many areas of the brain as shown

Fig. 5.14 The dopamine


innervation pathway
74 5 Mammalian Neural Structure

Fig. 5.15 The dopamine and serotonin releasing neurons sites. a Location of dopamine sites.
b Location of serotonin sites

Fig. 5.16 The serotonin


innervation pathway

in Fig. 5.16. It is known that serotonin levels determine the set point of arousal and
influence the pain control system.

5.4 Cortical Structure

The cortex consists of the frontal, occipital, parietal, temporal and limbic lobes
and it is folded as is shown in Fig. 5.17. It consists of a number of regions which
have historically been classified as illustrated in Fig. 5.18a. The cortical layer is thus
subdivided into several functional areas known as the Frontal, Occipital, Parietal,
Temporal and Limbic lobes. Most of what we know about the function of these lobes
comes from the analysis of the diminished abilities of people who have unfortunately
5.4 Cortical Structure 75

Fig. 5.17 Cortical folding

Fig. 5.18 Cortical lobes. a Cortical lobes. b The limbic lobe inside the cortex

had strokes or injuries. By studying these brave people very carefully we can discern
what functions they have lost and correlate these losses with the areas of their brain
that has been damaged. This correlation is generally obtained using a variety of brain
imaging techniques such as Computer Assisted Tomography (CAT) and Functional
Magnetic Resonance Imagery (fMRI) scans.
The Frontal lobe has a number of functionally distinct areas. It contains the
primary motor cortex which is involved in initiation of voluntary movement. Also, it
has an specialized area known as Broca’s area which is important in both written and
spoken language ability. Finally, it has the prefrontal cortex which is instrumental in
the maintenance of our personality and is involved in the critical abilities of insight
and foresight. The Occipital lobe is concerned with visual processing and visual
association. In the Parietal lobe, primary somato-sensory information is processed
in the area known as the primary somato-sensory cortex. There is also initial cortical
processing of tactile and proprioceptive input. In addition, there are areas devoted
to language comprehension and complex spatial orientation and perception. The
Temporal lobe contains auditory processing circuitry and develops higher order
auditory associations as well. For example, the temporal lobe contains Wernicke’s
area which is involved with language comprehension. It also handles higher order
visual processing and learning and memory. Finally, the Limbic system lies beneath
the cortex as in shown in Fig. 5.18b and is involved in emotional modulation of
cortical processing.
76 5 Mammalian Neural Structure

Fig. 5.19 Generic overview

5.4.1 Cortical Processing

In order to build a useful model of cognition, we must be able to model interactions


of cortical modules with the limbic system. Further, these interactions must be mod-
ulated by a number of monoamine neurotransmitters. It also follows that we need
a reasonable model of cortical processing. There is much evidence that cortex is
initially uniform prior to exposure to environmental signal. Hence, a good model of
generic cortical tissue, called isocortex, is needed. A model of isocortex is motivated
by recent models of cortical processing outlined in Raizada and Grossberg (2003).
There are other approaches, of course, but we will focus on this model here. A good
discussion of auditory processing is given in Nelken (2004) and how information
from multiple cortical areas can be combined into a useful signal is discussed in
Beauchamp (2005). But for the moment, let’s stick to one cortex at a time. The
Raizada article uses clues from visual processing to gain insight into how virgin
cortical tissue (isocortex) is wired to allow for its shaping via environmental input.
5.4 Cortical Structure 77

Clues and theoretical models for auditory cortex can then be found in the survey
paper of Merzenich (2001). We begin with a general view of a typical cortical col-
umn taken from the standard references of Brodal (1992), Diamond et al. (1985) and
Nolte (2002) as shown in Fig. 5.19. This column is oriented vertically with layer one
closest to the skull and layer six furthest in. We show layer four having a connection
to primary sensory data. The details of some of the connections between the layers
are shown in Fig. 5.20. The six layers of the cortical column consist of specific cell
types and mixtures described in Table 5.3.
We can make some general observations about the cortical architecture. First,
layers three and five contain pyramidal cells which collect information from layers
above themselves and send their processed output for higher level processing. Layer
three outputs to motor areas and layer five, to other parts of the cerebral cortex.
Layer six contains cells whose output is sent to the thalamus or other brain areas.
Layer four is a collection layer which collates input from primary sensory modalities
or from other cortical and brain areas. We see illustrations of the general cortical
column structure in Figs. 5.19 and 5.20. The cortical columns are organized into
larger vertical structures following a simple stacked protocol: sensory data → cortical
column 1 → cortical column 2 → cortical column 3 and so forth. For convenience,
our models will be shown with three stacked columns. The output from the last
column is then sent to other cortex, thalamus and the brain stem.

5.4.2 Isocortex Modeling

A useful model of generic cortex, isocortex, is that given in Grossberg (2003), Gross-
berg and Seitz (2003) and Raizada and Grossberg (2003). Two fundamental corti-
cal circuits are introduced in these works: the on-center, off-surround (OCOS) and
the folded feedback pathway (FFP) seen in Fig. 5.21a, b. In Fig. 5.21a, we see the
On-Center, Off-Surround control structure that is part of the cortical column control
circuitry. Outputs from the thalamus (perhaps from the nuclei of the Lateral Genic-
ulate Body) filter upward into the column at the bottom of the picture. At the top of
the figure, the three circles that are not filled in represent neurons in layer four whose
outputs will be sent to other parts of the column.
There are two thalamic output lines: the first is a direct connection to the input layer
four, while the second is an indirect connection to layer six itself. This connection
then connects to a layer of inhibitory neurons which are shown as circles filled in
with black. The middle layer four output neuron is thus innervated by both inhibitory
and excitatory inputs while the left and right layer four output neurons only receive
inhibitory impulses. Hence, the center is excited and the part of the circuit that is off
the center, is inhibited. We could say the surround is off. It is common to call this type
of activation the off surround. Next consider a stacked cortical column consisting of
two columns, column one and column two. There are cortico-cortical feedback axons
originating in layer six of column two which input into layer one of column one. From
layer one, the input connects to the dendrites of layer five pyramidal neurons which
78 5 Mammalian Neural Structure

Fig. 5.20 The structure of a cortical column

Table 5.3 Cortical column cell types


Layer Description Use
One Molecular
Two External granule layer
Three External pyramidal layer Output to other cortex areas
Four Internal granule layer Collect primary sensory input
or input from other brain areas
Five Internal pyramidal layer Output to motor cortex
Six Multiform layer Output to thalamus brain areas
5.4 Cortical Structure 79

Fig. 5.21 OCOS and FFP circuits. a The on-center, off-surround control structure. b The folded
feedback pathway control structure

connects to the thalamic neuron in layer six. Hence, the higher level cortical input is
fed back into the previous column layer six and then can excite column one’s fourth
layer via the on-center, off-surround circuit discussed previously. This description is
summarized in Fig. 5.21b. We call this type of feedback a folded feedback pathway.
For convenience, we use the abbreviations OCOS and FFP to indicate the on-center,
off-surround and folded feedback pathway, respectively. The layer six–four OCOS
is connected to the layer two–three circuit as shown in Fig. 5.22a. Note that the layer
four output is forwarded to layer two–three and then sent back to layer six so as to
contribute to the standard OCOS layer six–layer four circuit. Hence, we can describe
this as another FFP circuit. Finally, the output from layer six is forwarded into the
thalamic pathways using a standard OCOS circuit. This provides a way for layer six
neurons to modulate the thalamic outputs which influence the cortex. This is shown
in Fig. 5.22b.
The FFP and OCOS cortical circuits can also be combined into a multi-column
model (Raizada and Grossberg 2003) as seen in Fig. 5.23. It is known that cortical
outputs dynamically assemble into spatially and temporally localized phase locked
structures. A review of such functional connectivity appears in Fingelkurts et al.
(2005). We have abstracted a typical snapshot of the synchronous activity of par-
ticipating neurons from Fingelkurts’ discussions which is shown in Fig. 5.24. Each
burst of this type of synchronous activity in the cortex is measured via skull-cap EEG
equipment and Fingelkurts presents a reasoned discussion why such coordinated
ensembles are high level representations of activity. A cognitive model is thus inher-
ently multi-scale in nature. The cortex uses clusters of synchronous activity as shown
in Fig. 5.24 acting on a sub second to second time frame to successively transform raw
data representations to higher level representations. Also, cortical transformations
from different sensory modalities are combined to fuse representations into new
80 5 Mammalian Neural Structure

Fig. 5.22 Layer six connections. a The layer six–four connections to layer two–three are another
FFP. b The layer six to thalamic OCOS

Fig. 5.23 The OCOS/FFP


cortical model

Fig. 5.24 Synchronous


cortical activity
5.4 Cortical Structure 81

Fig. 5.25 The


norepinephrine innervation
pathway

representations at the level of cortical modules. These higher level representations


both arise and interact on much longer time scales. Further, monoamine inputs from
the brain stem modulate the cortical modules as shown in Figs. 5.25, 5.14 and 5.16.
Additionally, reticular formation neurons modulate limbic and cortical modules. It
is therefore clear that cognitive models require abstract neurons whose output can be
shaped by many modulatory inputs. To do this, we need a theoretical model of the
input/output (I/O) process of a single excitable cell that is as simple as possible yet
still computes the salient characteristics of our proposed system.

References

M. Beauchamp, See me, hear me, touch me: multisensory integration in lateral occipital—temporal
cortex. Curr. Opin. Neurobiol. 15, 145–153 (2005)
P. Brodal, The Central Nervous System: Structure and Function (Oxford University Press, New
York, 1992)
M. Conkey, A history of the interpretation of European ‘paleolithic art’: magic, mythogram, and
metaphors for modernity, in Handbook of Human Symbolic Evolution, ed. by A. Lock, C. Peters
(Blackwell Publishers, Massachusetts, 1999)
M. Diamond, A. Scheibel, L. Elson, The Human Brain Coloring Book (Barnes and Noble Books,
New York, 1985)
A. Fingelkurts, A. Fingelkurts, S. Kähkönen, Functional connectivity in the brain—is it an elusive
concept? Neurosci. Biobehav. Rev. 28, 827–836 (2005)
S. Grossberg, How does the cerebral cortex work? development, learning, attention and 3D vision
by laminar circuits of visual cortex, Technical Report TR-2003-005, Boston University, CAS/CS,
2003
S. Grossberg, A. Seitz, Laminar development of receptive fields, maps, and columns in visual cortex:
the coordinating role of the subplate, Technical Report 02-006, Boston University, CAS/CS, 2003
R. Holloway, Evolution of the human brain, in Handbook of Human Symbolic Evolution, ed. by
A. Lock, C. Peters (Blackwell Publishers, Massachusetts, 1999), pp. 74–116
82 5 Mammalian Neural Structure

J. McClelland, Failures to learn and their remediation, in Failures to learn and their remedia-
tion, ed. by J. McClelland, R. Siegler (Lawrence Erlbaum Associates Publishers, USA, 2001),
pp. 97–122
M. Merzenich, Cortical plasticity contributing to child development, in Mechanisms of Cognitive
Development: Behavioral and Neural Perspectives, ed. by J. McClelland, R. Siegler (Lawrence
Erlbaum Associates Publishers, USA, 2001), pp. 67–96
I. Nelken, Processing of complex stimuli and natural scenes in the auditory cortex. Curr. Opin.
Neurobiol 14, 474–480 (2004)
J. Nolte, The Human Brain: An Introduction to Its Functional Anatomy (Mosby, A Division of
Elsevier Science, 2002)
J. Pinel, M. Edwards, A Colorful Introduction to the Anatomy of the Human Brain: A Brain and
Psychology Coloring Book (Pearson, New York, 2008)
R. Raizada, S. Grossberg, Towards a theory of the laminar architecture of cerebral cortex: compu-
tational clues from the visual system. Cereb. Cortex 13, 100–113 (2003)
Chapter 6
Abstracting Principles of Computation

Classical computing via traditional hardware starts with computing primitives such
as AND and OR gates and then assembles large numbers of such gates into circuits
using microfabrication techniques to implement boolean functions. More compli-
cated functions and data other than boolean and/or integer can then be modeled
using this paradigm. The crucial feature here is that simple building blocks are real-
ized in hardware primitives which are then assembled into large structures. Quantum
computing is taking a similar approach; the primitive unit of information is no longer
a boolean bit but instead is a more complicated entity called a qubit. The quantum
nature of the qubit then allows the primitive functional AND/OR gates to have
unusual properties, but the intellectual pathway is still clear. Hardware primitives are
used to assemble logic gates’ logic gate ensembles are used to implement computable
numbers via functions. Is this a viable model of biological computation?
In the thoughts that follow, we will argue that this is the incorrect approach and
biological computation is instead most profitably built from a blurring of the hard-
ware/ software distinction. Following Gerhart and Kirschner (1997), we present a
generic biological primitive based on signal transduction into a second messenger
pathway. We will look at specific cellular mechanisms to give the flavor of this
approach—all based on the presentation in Gerhart and Kirschner (1997) but placed
in very abstract terms for our purposes. We will show how they imply an abstract
computational framework for basic triggering events. The trigger model is essentially
a good approximation to a general second messenger input. We can use these ideas
in future development of a useful approximation to a biological neuron for use in
cognitive models.

6.1 Cellular Triggers

A good example of a cellular trigger is the transcriptional control of the factor NFκB
which plays a role in immune system response. This mechanism is discussed in

© Springer Science+Business Media Singapore 2016 83


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_6
84 6 Abstracting Principles of Computation

a semi-abstract way in Gerhart and Kirschner (1997); we will discuss even more
abstractly.
Consider a trigger T0 which activates a cell surface receptor. Inside the cell, there
are always protein kinases that can be activated in a variety of ways. Here we denote
a protein kinase by the symbol PK. A common mechanism for such an activation is
to add to PK another protein subunit U to form the complex PK/U . This chain of
events looks like this:

T0 → Cell Surface Receptor → PK/U

PK/U then acts to phosphorylate another protein. The cell is filled with large
amounts of a transcription factor we will denote by T1 and an inhibitory protein
for T1 we label as T1∼ . This symbol, T1∼ , denotes the complement or anti version of
T1 . In the cell, T1 and T1∼ are generally joined together in a complex denoted by
T1 /T1∼ . The addition of T1∼ to T1 prevents T1 from being able to access the genome
in the nucleus to transcribe its target protein.
The trigger T0 activates our protein kinase PK to PK/U . The activated PK/U
is used to add a phosphate to T1∼ . This is called phosphorylation. Hence,

PK/U + T1∼ → T1∼ P

where T1∼ P denotes the phosphorylated version of T1∼ . Since T1 is bound into the
complex T1 /T1∼ , we actually have

PK/U + T1 /T1∼ → T1 /T1∼ P

In the cell, there is always present a collection of proteins which tend to bond with
the phosphorylated form T1∼ P. Such a system is called a tagging system. The protein
used by the tagging system is denoted by V and usually a chain of n such V proteins
is glued together to form a polymer Vn . The tagging system creates the new complex

T1 /T1∼ PVn

This gives the following event tree at this point:

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → T1 /T1∼ P

T1 /T1 P + tagging system → T1 /T1∼ PVn

Also, inside the cell, the tagging system coexists with a complimentary system whose
function is to destroy or remove the tagged complexes. Hence, the combined system

Tagging ←→ Removal → T1 /T1∼ P

is a regulatory mechanism which allows the transcription factor T1 to be freed from


its bound state T1 /T1∼ so that it can perform its function of protein transcription
6.1 Cellular Triggers 85

in the genome. The removal system is specific to Vn molecules; hence although it


functions on T1∼ PVn , it would work just as well on QVn where Q is any other tagged
protein. We will denote the removal system which destroys Vn tagged proteins Q
from a substrate S by the symbol

f SQVn

This symbol means the system acts on SQVn units and outputs S via mechanism f .
Note the details of the mechanism f are largely irrelevant here. Thus, we have the
reaction

T1 /T1∼ PVn + f SQVn → T1

which releases T1 into the cytoplasm. The full event chain for a cellular trigger is
thus

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → T1 /T1∼ P
T1 /T1∼ P + tagging system → T1 /T1∼ PVn
T1 /T1∼ PVn + f SQVn → T1
T1 → nucleus → tagged protein transcription P(T1 )

where P(T1 ) indicates the protein whose construction is initiated by the trigger T0 .
Without the trigger, we see there are a variety of ways transcription can be stopped:
• T1 does not exist in a free state; instead, it is always bound into the complex T1 /T1∼
and hence can’t be activated until the T1∼ is removed.
• Any of the steps required to remove T1∼ can be blocked effectively killing tran-
scription:
– phosphorylation of T1∼ into T1∼ P is needed so that tagging can occur. So anything
that blocks the phosphorylation step will also block transcription.
– Anything that blocks the tagging of the phosphorylated T1∼ P will thus block
transcription.
– Anything that stops the removal mechanism f SQVn will also block transcription.
The steps above can be used therefore to further regulate the transcription of T1 into
the protein P(T1 ). Let T0 , T0 and T0 be inhibitors of the steps above. These inhibitory
proteins can themselves be regulated via triggers through mechanisms just like the
ones we are discussing. In fact, P(T1 ) could itself serve as an inhibitory trigger—i.e.
as any one of the inhibitors T0 , T0 and T0 . Our theoretical pathway is now:

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → step i T1 /T1∼ P
T1 /T1∼ P + tagging system → step ii T1 /T1∼ PVn
T1 /T1∼ PVn + fSQVn → step iii T1
T1 → nucleus → tagged protein transcription P(T1 )
86 6 Abstracting Principles of Computation

where the step i, step ii and step iii can be inhibited as shown below:

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → step i T1 /T1∼ P
↑ T0 kill

T1 /T1 P + tagging system → step ii T1 /T1∼ PVn
↑ T0 kill
T1 /T1∼ PVn + fSQVn → step iii T1
↑ T0 kill
T1 → nucleus → tagged protein transcription P(T1 )

Note we have expanded to a system of four triggers which effect the outcome of
P(T1 ). Also, note that step i is a phosphorylation step. Now, let’s refine our analysis
a bit more. Usually, reactions are paired: we typically have the competing reactions

PK/U + T1 /T1∼ → T1 /T1∼ P


T1 /T1∼ P → T1 /T1∼ + PK/U

Hence, we can imagine that step i is a system which is in dynamic equilibrium. The
amount of T1 /T1∼ P formed and destroyed forms a stable loop with no net T1 /T1∼ P
formed. The trigger T0 introduces additional PK/U into this stable loop and thereby
effects the net production of T1 /T1∼ P. Thus, a new trigger T0 could profoundly effect
phosphorylation of T1∼ and hence production of P(T1 ). We can see from the above
comments that very fine control of P(T1 ) production can be achieved if we think of
each step as a dynamical system in flux equilibrium.
Note our discussion above is a first step towards thinking of this mechanism in
terms of interacting objects.

6.2 Dynamical Loop Details

Our dynamical loop consists of the coupled reactions

k1
PK/U + T1 /T1∼ → T1 /T1∼ P
k−1
T1 /T1∼ P → T1 /T1∼ + PK/U

where k1 and k−1 are the forward and backward reaction rate constants and we assume
the amount of T1 /T1∼ inside the cell is constant and maintained at the equilibrium
concentration [T1 /T1∼ ]e . Since one unit of PK/U combines with one unit of T1 /T1∼
to phosphorylate T1 /T1∼ to T1 /T1∼ P, we see

[T1 /T1∼ ](t) = [T1 /T1∼ ]e − [PK/U ](t). (6.1)


6.2 Dynamical Loop Details 87

Hence,

d[PK/U ]
= −k1 [PK/U ] [T1 /T1∼ ] + k−1 [T1 /T1∼ P]
dt

For this reaction, we have [PK/U ] = [T1 /T1∼ ]; thus, we find

d[PK/U ]
= −k1 [PK/U ]2 + k−1 [T1 /T1∼ P] (6.2)
dt
From Eq. 6.1, we see

d[T1 /T1∼ ] d[PK/U ]


=−
dt dt
Hence,

d[T1 /T1∼ ]
= k1 [PK/U ]2 − k−1 [T1 /T1∼ P] (6.3)
dt
However, kinetics also tells us that

d[T1 /T1∼ ]
= −k1 [PK/U ]2 + k−1 [T1 /T1∼ P] (6.4)
dt
and so, equating these two expressions, we find

2k1 [PK/U ]2 = 2k−1 [T1 /T1∼ P]

which implies

k1
[PK/U ]2 = [T1 /T1∼ P] (6.5)
k−1

Using Eq. 6.5 in Eq. 6.3, we find that

d[T1 /T1∼ ]
= k1 [PK/U ]2 − k−1 [T1 /T1∼ P] = 0 (6.6)
dt

Now let [PK/U ]e denote the equilibrium concentration established by Eq. 6.5. Then
if the trigger T0 increases [PK/U ] by δPK/U , from Eq. 6.5, we see

k1  2
[T1 /T1∼ P]new = [PK/U ]e + δPK/U (6.7)
k−1

which implies the percentage increase from the equilibrium level is


88 6 Abstracting Principles of Computation
 2
δPK/U
100 1 +
[PK/U ]e

This back of the envelope calculation can be done not only at step i but at the other
steps as well. Letting δT1 /T1∼ P and δT1 /T1∼ PVn denote changes in critical molecular
concentrations, lets examine the stage ii equilibrium loop. We have

k2
T1 /T1∼ P → T1 /T1∼ PVn (6.8)
k−2
T1 /T1∼ PVn → T1 /T1∼ P (6.9)

The kinetic equations are then

d[T1 /T1∼ P]
= −k2 [T1 /T1∼ P] + k−2 [T1 /T1∼ PVn ] (6.10)
dt
d[T1 /T1∼ PVn ]
= k2 [T1 /T1∼ P] − k−2 [T1 /T1∼ PVn ] (6.11)
dt
Dynamic equilibrium then implies that

d[T1 /T1∼ P] d[T1 /T1∼ PVn ]


= (6.12)
dt dt
and hence

2k2 [T1 /T1∼ P] = 2k−2 [T1 /T1∼ PVn ]

or
k2
[T1 /T1∼ P] = [T1 /T1∼ PVn ] (6.13)
k−2

Equation 6.13 defines the equilibrium concentrations of [T1 /T1∼ P]e and
[T1 /T1∼ PVn ]e . Now if [T1 /T1∼ P] increased to [T1 /T1∼ P] + δT1 /T1∼ P , the percentage
increase would be
 2
δT1 /T1∼ P
100 1 +
[T1 /T1∼ P]e

If the increase in [T1 /T1∼ P] is due to step i, we know


6.2 Dynamical Loop Details 89

δT1 /T1∼ P = [T1 /T1∼ P]new − [T1 /T1∼ P]e


k1  2 k1
= [PK/U ]e + δPK/U − [PK/U ]2e
k−1 k−1
k1  
= 2[PK/U ]δPK/U + δPK/U 2
k−1

We also know from Eq. 6.5 that

k1
[T1 /T1∼ P]e = [PK/U ]2e
k−1

and hence,
 
k1
2[PK/U ]e δPK/U + δPK/U
2
δT1 /T1∼ P k−1
=
[T1 /T1∼ P]e k1
k−1
[PK/U ]2e
 2
δPK/U δPK/U
=2 +
[PK/U ]e [PK/U ]e

δx
For convenience, let’s define the relative change in a variable x as rx = x
. Thus, we
can write
δT1 /T1∼ P
rT1 /T1∼ P =
[T1 /T1∼ P]e
δPK/U
rPK/U =
[PK/U ]e

which allows us to recast the change in [T1 /T1∼ ] equation as

rT1 /T1∼ P = 2rPK/U + rPK/U


2

Hence, it follows that

δT1 /T1∼ PVn = [T1 /T1∼ PVn ]new − [T1 /T1∼ PVn ]e
k2   k2  
= [T1 /T1∼ P]e + δT1 /T1∼ P − [T1 /T1∼ P]e
k−2 k−2
k2
= δT /T ∼ P
k−2 1 1

and so
rT1 /T1∼ PVn = rT1 /T1∼ P
= 2rPK/U + rPK/U
2
90 6 Abstracting Principles of Computation

From this, we see that trigger events which cause 2rPK/U + rPK/U
2
to exceed one
√ ∼
(rPK/U > 2 − 1), create an explosive increase in [T1 /T1 PVn ]. Finally, in the third
step, we have
k3
T1 /T1∼ PVn → T1 (6.14)
k−3
T1 → T1 /T1∼ PVn (6.15)

This dynamical loop can be analyzed just as we did in step ii. We see

k3
[T1 /T1∼ PVn ]e = [T1 ]e
k−3

and the triggered increase in [PK/U ]e by δPK/U induces the relative change

k3
δ T1 = δT /T ∼ PV
k−3 1 1 n
k3 k2
= δT /T ∼ P
k−3 k−2 1 1
k3 k2 k1  
= 2 δPK/U [PK/U ]e + δPK/U
2
k−3 k−2 k−1
k3 k2 k1  
= 2 rPK/U + rPK/U
2
[PK/U ]2e
k−3 k−2 k−1

We can therefore clearly see the multiplier effects of trigger T0 on protein production
T1 which, of course, also determines changes in the production of P(T1 ).
The mechanism by which the trigger T0 creates activated kinase PK/U can be
complex; in general, each unit of T0 creates λ units of PK/U where λ is quite large—
perhaps 10,000 or more times the base level of [PK/U ]e . Hence, if rPK/U = β and
K = k−1k1kk−22 kk3−3 , we have

δT1 = (2β + β 2 ) K [PK/U ]2e >> [PK/U ]e

for β >> 1. From this quick analysis, we can clearly see the potentially explosive
effect changes in T0 can have on PK/U !

6.3 An Implication for Biological Computation

Our first biological primitive is thus the trigger pathway we have seen before:
6.3 An Implication for Biological Computation 91

Fig. 6.1 Protein


transcription tree

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → step i T1 /T1∼ P
↑ T0 kill
T1 /T1∼ P + tagging system → step ii T1 /T1∼ PVn
↑ T0 kill

T1 /T1 PVn + fSQVn → step iii T1
↑ T0 kill
T1 → nucleus → tagged protein transcription P(T1 )

We know from our discussions that δT1 is (2β +β 2 )K [PK/U ]2e and we can interpret
the output protein P(T1 ) as any of the molecules Q needed to kill a pathway. If P(T1 )
was a T0 receptor, then we would have that the expression of the target protein P(T1 )
generates δPK/U !
Let us note two every important points now: There is richness to this pathway
and the target P(T1 ) can alter hardware or software easily. If P(T1 ) was a K +
voltage activated gate, then we see an increase of δT1 (assuming 1 − 1 conversion of
T1 to P(T1 )) in the concentration of K + gates. This corresponds to a change in the
characteristics of the axonal pulse. Similarly, P(T1 ) could create Na+ gates thereby
creating change in axonal pulse characteristics. P(T1 ) could also create other proteins
whose impact on the axonal pulse is through indirect means such as the kill T0 etc.
pathways. There is also the positive feedback pathway via δPK/U through the T0
receptor creation. In effect, we have a forking or splitting of possibilities as shown
in Fig. 6.1. Note that all of these pathways are essentially modeled by this primitive.

6.4 Transport Mechanisms and Switches

Another common method of cellular information processing is based on substances


which are transported into a cell and then, depending on their concentration level,
are either stored in binding sites or used to initiate the construction of a protein. The
abstract model we discuss here is based on the cellular mechanism for dealing with
free iron Fe in the cellular cytosol as detailed in Gerhart and Kirschner (1997). A
substance M is to be carried into the cell. M can be a single atom of some type of
a molecular unit. M is transported by a transport protein, tM. The transport protein
92 6 Abstracting Principles of Computation

binds to a specialized receptor rM to create rtM. The resulting complex initiates


the transport through the membrane wall culminating in the release of M into the
cytoplasm. Hence, letting M/tM denote the transported complex, we see

M + tM → M tM (transport) + rM (receptor)
= M/rtM (receptor complex)
→ M (freed M)

There are many other details, of course, but the diagram above captures the basic
information flow. In skeletal outline

M + tM → M tM → M/rtM → M

Once inside the cell, M can be bound into a storage protein sP or M can be used
in other processes. The concentration of M, [M], usually determines which path the
substance will take. Let’s assume there are two states, low and high for [M] which
are determined by a threshold, th. A simple threshold calculation can be performed
by a sigmoidal unit as shown in Fig. 6.2.
The standard mathematical form of such a sigmoid which provides a transfer
between the low value of 0 and high value of 1 is given by
  
1 x−o
h(x) = 1 + tanh (6.16)
2 g

Fig. 6.2 A simple sigmoidal threshold curve


6.4 Transport Mechanisms and Switches 93

where x is our independent variable (here [M]), o is an offset (the center point of the
sigmoid), and g is called a gain parameter as it controls the slope of the transfer from
low signal to the high signal via the value h (o). Note,
  
 1 2 x−o
h (x) = sech (6.17)
2g g

which gives h (0) = 2g


1
. In general, the sigmoid which transfers us from a low value
of L to a high value of H is given by
  
1 x−o
h(x) = (H + L) + (H − L)tanh (6.18)
2 g

This is the same sigmoidal transformation that will be used when we discuss neural
computation in nervous systems in Chap. 14 and subsequent chapters. If we used a
transition between low and high that was instantaneous, such an abrupt transition
would not be differentiable which would cause some technical problems. Hence, we
use a smooth transfer function as listed above and we interpret a low gain g as giving
us a fuzzy or slow transition versus the sharp transition we obtain from a high gain g.
Let’s assume M binds to a switching complex denoted swM when its concentration
is in the high state. The bound complex swM/M is then inactivated. In the low state,
M does not bind to swM and swM is then able to bind to two sites on the messenger
RNA, mRNA, for the two proteins rM and sPM.
The machinery by which mRNA is transformed into a protein can be thought of as
a factory: hence, we let frM and fspM denote the rM and sPM factories respectively.
The mRNAs for these proteins will have the generic appearance as shown in Fig. 6.3.
If you look at Fig. 6.3 closely, you’ll see that there are special structures attached
to the mRNA amino acid sequence. These look like tennis rackets and are called
stem loops. The mRNA for both rM and sPM have these structures. The unbound
switching protein swM can dock with any stem loop if the conditions are right.
We know from our basic molecular biology, that binding near a 3 end stabilizes
mRNA against degradation and thus effectively increases protein creation. On the
other hand, binding near the 5 end inhibits translation of mRNA into protein and

Fig. 6.3 The mRNA object


94 6 Abstracting Principles of Computation

Fig. 6.4 rM and sPM


factory controls

effectively serves as a stop button. Thus, for rM and sPM, we can imagine their
factories possessing two control buttons—stop and stabilize. The switch binds to
the 3 end on the mRNA for rM and the 5 end for the sPM mRNA. As you can see
in Fig. 6.4, where SM denotes swM, Increased rM implies more M enters the cell;
hence, [M] increases. Also, since decreased sPM implies there are fewer binding sites
available, less M is bound, implying [M] increases also. The switch swM thus effects
two primitive components of the cell; the number of M receptors in the cell wall and
the number of storage sites for M in the cytosol. This mechanism could work for
any substance M which enters a cell through a receptor via a transport protein whose
concentration [M] is controlled by a storage/free cytosol interaction. The generic
mechanism is illustrated in Fig. 6.5. This mechanism is based on how iron is handled
in a cell. In human cells, there is no mechanism for removing excess iron; hence, the
high pathway can only show inactivation of the switch and additional binding into
the binding protein. From our discussion, it follows we effectively control [M] by a
switch which can directly activate the genome!

6.5 Control of a Substance via Creation/Destruction


Patterns

Another way we can control the concentration of a substance N is indirectly through


a creation/destruction cycle. Assume N is bound into a complex B1 N and there are
two competing reactions:

B1 N → B1 + N Breakdown of B1 N
B1 + N → B1 N Creation of B, N

These reactions are made possible by enzymes. Hence letting EB be the breakdown
enzyme, we have
6.5 Control of a Substance via Creation/Destruction Patterns 95

Fig. 6.5 Generic substance M control mechanism

B1 N + EB → B1 + NR

where NR is the N complex that is temporarily formed. The enzyme which breaks
down NR is labeled EBNR and we have the reaction

NR + EBNR → N + R

The enzyme controlling the creation of B1 N is called EC with reaction

B1 + NR + EC → B1 N

Finally, to create NR, we need the enzyme ECNR ; the reaction is

N + R + ECNR → NR

Combining, we have four reactions:


 
B1 N + EB → B1 + NR B + NR + EC → B1 N
Creates N 1 Binds N
NR + EBNR → N + R N + R + ECNR → NR

Note that the binding of N effectively removes it from the cytosol. Now, the above
four equations tell us that the creation or destroying of the complex B1 N is tantamount
to the binding or creating of N itself.
96 6 Abstracting Principles of Computation

We can control which of these occurs as follows. Let’s assume there is a pro-
tein T1 whose job is to phosphorylate a protein S1 . The protein S1 is bound with
another protein S2 and the complex, denoted S1 /S2 will function as a switch. Let the
phosphorylated protein S1 be denoted by S1p ; then we have the reaction

T1 + S1/S2 → S1p /S2

Now imagine our four enzymatically controlled reactions as being toggled on and
off by an activating substance. We illustrate this situation in Fig. 6.6a where the
action of S1 /S2 and S1p /S2 are as indicated. The S1p /S2 switch increases [N] and the
S1 /S2 switch decreases [N]. In detail, S1p /S2 increases the breakdown of NR, more
effectively decreasing [B1 N]; the S1 /S2 increases [NR], effectively increasing [B1 N].
Hence, [B1 N] and N work opposite one another. The control of the phosphorylation
of S1 to S1p is achieved via a T1 /T1∼ mechanism. Specifically, if rB1 N is the B1 N
receptor, then we have the control architecture shown in Fig. 6.6b. In this figure, the
protein P(T1 ) is actually T1 itself. We see we can combine mechanisms to generate
control of a substances concentration in the cytosol. Note, we are still seeing a
blurring of the hardware/software distinction in this example, just as in the others.
These paired reactions are often called futile cycles.
The trigger systems we are modeling must include a mechanism for deactivation.
Indeed, much of the richness of biological computation comes from negative feed-
back. Consider the generic trigger system of Fig. 6.7. This system allows for protein
creation, but it is equally important to allow for deactivation of the product if its
concentration becomes too high. We can use the generic switch method discussed
in Sect. 6.4 in the case where the trigger T0 exists in both a free and bound state
upon entry into the cell. The trigger T0 is processed by the usual trigger mechanism
into the protein P(T1 ) which is sent into a sigmoid switch h. If the concentration of
P(T1 ) is low, the usual stop/ stabilize control is called which increases T0 . If the
concentration is high, we bind T0 to a buffer complex. This mechanism is applicable
to the triggers that control free Ca++ in the cell such as neurotransmitters.

6.6 Calcium Ion Signaling

Adding or removing a phosphate ion, PO3− to a protein or other molecular complex


is important to the control of many reaction pathways. To add such a phosphate
ion PO3− to a protein, however, requires energy and a helper molecule so that the
reaction can proceed at a useful speed. The specialized protein which help make
these reactions possible are called protein kinases. To understand this a little better,
we need to look at the ATP–ADP reaction. Adenosine Triphosphate (ATP) is the
nucleotide adenine plus a ribose sugar with three phosphates, terminated in a methyl
ion, linked to it as a linear chain, PO3− − PO3− − PO3− CH2 . Adenosine Diphosphate
(ADP) has a linear chain of just two phosphates PO3− − PO3− CH2− added. When we
remove another phosphate, we have only a single PO3− CH2− bound to the adenosine
6.6 Calcium Ion Signaling 97

Fig. 6.6 Control strategies.


a Futile control. b Switch
control
98 6 Abstracting Principles of Computation

Fig. 6.7 Generic trigger pathway with feedback

core. This molecule is then called Adenosine Monophosphate or AMP. In these three
versions, the PO3− chains are bound to the ribose sugar directly and do not have an
attachment to the adenine nucleotide itself. There is a fourth version which binds a
PO3− both to the adenine and the ribose both. This bond looks cyclic in nature and so
this last version is called Cyclic Adenosine Monophosphate or CAMP. The formation
and breakdown of these compounds are controlled by enzymes. In reaction form we
have
adenylate kinase
2 ADP −−−−−−−−→ ATP + AMP
AMP → Adenosine

There are 4 phosphate ions on the 2 ADP molecules; adenylate kinase facilitates
the move one of them from one ADP to the other ADP to make a molecule ATP.
The remaining ADP then becomes an AMP molecule. The enzyme adenylate kinase
maintains the concentration of ATP, ADP and AMP at approximately 1 : 1 : 1 ratios.
Hence,

[AMP][ATP]
keq = ≈1
[ADP]2

and

[ADP]2
[AMP] ≈
[ATP]
6.6 Calcium Ion Signaling 99

Now in a fully energized cell, [ATP] is approximately 4 mM and [ADP] is on the


order of .4 mM. This is a 10 : 1 ratio. We also know the reaction ATP → ADP +PO3−
and its reverse maintains a 1 : 1 : 1 ratio. Thus, the ATP to ADP reaction implies that
the concentration of the two should be the same. However, the AMP reaction tells
us that in a fully energized cell, ATP and ADP concentrations differ by a factor of
10 from this equilibrium ratio. From the AMP equation, we see that the equilibrium
concentration of AMP must be (0.4)
2

4
or .04 mM. There is also another way to break
down ATP to ADP which involves muscle contraction under anaerobic conditions.
This gives [ATP] ≈ [ADP] ≈ 2 mM. In this case, when adenylate cyclase is at
2
equilibrium, we find [AMP] is approximately 22 or 2 mM. Hence, with this second
mechanism, the equilibrium concentration of ATP is half the concentration of ATP
from the first reaction pathway. The equilibrium concentration of AMP however
varies significantly more: it is 0.4 mM in the first and 2 mM in the secon—a fifty fold
increase. Hence, the concentration of AMP is a very sensitive indicator of which
reaction is controlling the energy status of the cell.
Now, inside the cytosol, free Ca++ exists in .1 µM to .2 µM concentrations.
Outside the cytosol, this ion’s concentration is much higher—1.8 mM. Thus, the
concentration gradient between inside and outside is on the order of 104 . This very
steep gradient is maintained by surface membrane proteins, a sodium/Ca++ exchange
protein and a membrane calcium pump. In addition, the endoplasmic reticulum (ER)
stores free calcium in pools within its membrane. The movement of calcium in and
out of these pools is controlled by a pump in the ER which uses the reaction between
adenosine triphosphate (ATP) and adenosine diphospate (ADP) to drive calcium in
or out of the pools. The ER pump is therefore called the ATPase pump.
Rapid increases in cystolic Ca++ are used as regulatory signals. Hormones, neuro-
transmitter and electrical activity control Ca++ movement into the cytosol. Inositol-
1,4,5 triphosphate or IP3 controls movement of the ions in and out of the storage
pools in the ER. There is probably 50–100 µM of Ca++ inside the cell but most of
this is bound into proteins such as calmodulin (CAM). Calmodulin relays calcium
signals to other protein targets. The other proteins serve usually as Ca++ buffers
which therefore limit the amount of Ca++ that is free in the cell. Different neurons
have different levels of Ca++ buffering proteins. This implies that there are large
differences in physiological behavior between these classes of neurons.
Ca++ gets into the cytosol from the extracellular fluid via Ca++ gates or channels.
These ports can be voltage dependent like a classical sodium or potassium voltage
gated channel or they can be ligand gated. This means that some protein must bind
to the port in some fashion to initiate further mechanisms that control the amount of
free Ca++ in the cell. These mechanisms could be as direct as the ligand opens the
port and allows Ca++ to come in or could be indirect as we have already discussed
for generic T0 triggers and switches. The important thing to remember is that the
amount of free calcium ion in the cytosol is controlled by four things:
• The ATP based pump in the ER: one calcium ion goes out for each ATP to ADP
reaction. The rate of this pump is controlled by calmodulin,
• calcium ion in through a voltage gated channel,
100 6 Abstracting Principles of Computation

• calcium ion in through a ligand gated channel,


• the calcium/ sodium exchange pump: one calcium ion out for each three sodium
ions coming in.
Calmodulin is a Ca++ receptor protein. One CAM molecule has four Ca++ binding
domains. At resting Ca++ concentrations, little Ca++ is bound to CAM. As the
concentration of Ca++ increases, the binding sites of CAM are occupied and CAM
becomes a multifunctional activator. Roughly speaking, here is what happens.
• A ligand such as a neurotransmitter or hormone binds to its receptor or gate in the
cell membrane. Such a ligand is often called an agonist.
• In physical proximity with the receptor is a protein called the G-protein. It generally
consists of three subunits, G α , G β and G γ . The G protein is called a coupling
protein as it transduces the extracellular signal into chemical pathway action inside
the cytosol.
• At rest, the coupling protein is bound into a complex with Guanine Diphosphate
or GDP. They exist diffusively within the cytosol. If the bound complex G/GDP
encounters a receptor with its bound ligand or agonist, the GDP pops off the
complex. Inside the cytosol, there is free GTP. Once the GDP is off, GTP from
the cytosol takes its place. The coupling protein G then splits into two pieces:
the complex G α /GTP and the complex G β G γ . These two complexes are called
activated forms of the coupling protein G. The bound GDP can be released and
replaced by Guanine Triphosphate or GTP. Hence, there is a mechanism to convert
the complex, G/GDP to G/GTP.
• The enzyme adenyl cyclase or AC then combines with ATP to create CAMP as
we have discussed. Phosphodiesterase or PDE can then deactivate CAMP back to
AMP. However, CAMP dependent protein kinase or PKA then adds a phosphate ion
to a target protein. Adding a phosphate ion to a protein is called phosphorylation.
Any protein that puts a phosphate ion on another protein is called a kinase. Kinases
works by breaking a phosphate group off of ATP (which releases energy) and taking
that phosphate and adding it to the protein. Once this is done, the protein is said to
be activated. An activated protein can then create other molecules and complexes
via reaction pathways of its own. The amount of protein that is phosphorylated at
any time is determined by the balance between the rates of phosphorylation by its
kinase and the rate of dephosphorylation by what is called a phosphotase. This is
a classic example of a futile cycle we have discussed.
• In general, the activated protein from the step above will eventually activate protein
transcription from the genome.
It is clear then that the coupling protein mechanism above allows an extracellular
signal to increase [CAMP] which in turn alters the activity level of the cell via the
PKA pathway. There are very important information processing consequences here:
a small number of occupied receptors can have a major influence on the cell. First, the
nature of the G protein mediated signals is that there is amplification of the message.
First, the nature of the G protein mediated signals is that there is amplification
of the message. An agonist can activate many different G protein—CAMP path-
ways. Note the specificity of the extracellular signal via the agonist docking into the
6.6 Calcium Ion Signaling 101

receptor is multiplied many fold by the fact that the G/GDP complexes exist diffu-
sively in the cell. The extracellular signal binds to multiple receptors and each bound
receptor can interact with a different G/GDP complex. Hence, out of this soup of
different G/GDP complexes, multiple G/GDP complexes could interact with the
same signal. Hence, the signal can be enormously amplified. Roughly speaking, this
amplification is on the order of (2β +β 2 )K [G] for suitable β and K as discussed in
Sect. 6.2. Also, one activated G α /GTP can interact with many CAMP. Each CAMP
can create activated proteins via PKA which could all be different due to concentra-
tion differences in the created [CAMP], and the time and spatial activity envelope of
the [CAMP] pulse. Further, PKA can activate any protein that has a certain pattern
of amino acids around serine or threonine residues. Thus, any such protein will be
called by PKA to participate in the extracellular signals effects.
Second, more than one agonist can increase [CAMP] concentration via these
mechanisms. Thus, a general feature of this kind of agonist stimulation is that the
effect is felt away from the site of the initial docking of the extracellular signal. Also,
the full effect of the signal is the creation of a new protein which can completely
rewrite the hardware and software of the excitable nerve cell. We conclude that there
is both amplification and divergence of action. Further, an essential feature of this sort
of activation is that the signaling pathway can be turned off. The G protein system is
self-timing. The G α /GTP is converted back to G/GDP automatically, so if there is
not a stimulus to create new G/GTP, the system reverts to it rest state. Also, CAMP
is always cleaved to AMP by PDE so without the creation of new CAMP, the phos-
phorylation of the target proteins could not continue. Finally, the phosphate ions on
the activated target proteins are always being removed by appropriate phosphatases.
It follows that if a cell has the three mechanism mentioned above operating a a high
state, then extracellular signals will not last long, approximately less than 2 s.

6.7 Modulation Pathways

The monoamine neurotransmitters norepinephrine, dopamine and serotonin modu-


late the use of Ca++ within a cell in a variety of ways. In Fig. 6.8, we have used
the discussions in Hille (1992), to indicate some of the complexity of these path-
ways. In the figure, we use the following legends: AC, adenyl cyclase; CAMP, cyclic
adenosine monophosphate; G proteins G i , G s , G o and G p ; PDE, phosphodiesterase;
PKA, CAMP dependent kinase; CamK, calmodulin dependent kinase; PLC, phos-
pholipase C; PKC, C kinase; IP3, inositol-1,4,5 triphosphate; DAG, diacylglycerol;
IN1, PKA inhibitor; PP, generic protein phosphotase; S1 , S2 , S3 , S4 , S6 and S7 are
various surface proteins. Hence, creation of these proteins alters the hardware of
the cell. These proteins Si could be anything used in the functioning of the cell.
We illustrate the pathways used by the three major monoamine neurotransmitters as
well, DP, dopamine; S, serotonin and NE, norepinephrine. NE has two gates shown
subscripted by 1 and 2 which initiate different G pathways. The examples we have
discussed give us a way to unify our understanding of cell signaling. Following Bray
102 6 Abstracting Principles of Computation

Fig. 6.8 Monoamine and


calcium pathways

(1998), we can infer protein kinases, protein phosphatases, transmembrane receptors


etc. that carry and process messages inside the cell are associated with compact clus-
ters of molecules attached to the cell membrane which operate as functional units
and signaling complexes provide an intermediate level of organization like the inte-
grated circuits used in VLSI. However, this interpretation is clearly flawed as these
circuits are self-modifying. The general organization of the action of the extracellular
agonists here is thus:

Agonist → Receptor → G
→ Enzyme → Second Messenger
→ Kinase → Effector Protein

6.7.1 Ligand—Receptor Response Strategies

The TAR complex is a cluster of proteins associated with the chemotaxic receptor
of the coliform bacteria. It is a large protein which is inserted through the bacteria’s
membrane. It has three portions: the extracellular part where the external signals
6.7 Modulation Pathways 103

bind, the transmembrane part which is inside the membrane and the intracellular
part which is in the cellular cytosol. When the bacteria is moving in an aspartate
concentration gradient, signals are generated to the motor assemblies that drive its
flagella movement. This allows the bacteria to move toward the food source. The
extracellular signal is the aspartate current. The way this works is as follows:
• Aspartate binds to the extracellular domain of the receptor.
• This binding causes a change in the three dimensional appearance of the intracel-
lular portion of TAR. This is called a conformational change.
• The intracellular part of TAR is made up of 4 proteins. The two closest to the
membrane are labeled CheZ which can either by methylated or de methylated by
the enzymes CheR or CheB, respectively. Below them is the protein CheW and
below that is the protein CheA. CheA is an enzyme which phosphorylates both
CheY and CheB. CheY and CheB exist free in the cytosol.
The TAR control action is based on several states. First, if no ligand is bound to TAR
and CheZ is highly methylated, CheA is active and it phosphorylates both CheB and
CheY . The high level of phosphorylated CheYp increases the frequency of the motor’s
switch to clockwise flagella rotation which causes tumbling—essentially, a random
search pattern for food. The phosphorylated CheBp slowly removes methyl from
CheZ which resets the signaling state to that of no ligand and no CheZ methylation.
If TAR does not have a ligand bound and CheZ is not highly methylated, CheA is
inactive and the level of CheYp is reduced. This increases the rate of counterclockwise
flagella rotation as there are fewer switches to clockwise. This results in the bacteria
swimming in a specific direction and hence, this is not a random search pattern.
Finally, if the aspartate signal binds TAR whether or not CheZ is methylated, CheA
becomes inactive which leads to directional swimming as outlined above.
The phosphorylation of CheY occurs at a rate which is proportional to the rate of
external signal activation of TAR. The TAR complex thus operates as a self-contained
solid state processing unit which “integrates” extracellular inputs to produce a certain
rate of phosphorylation of CheA. This influences the rate of change of CheYp and
thus produces alterations in the way the flagella handles rotations. This switches the
bacteria from random search to directed swimming. Control of the CheYp creation
rate effectively controls the ratio of counterclockwise to clockwise rotations. This
sets the direction of the swimming or if the switching is too frequent, sets the motion
to be that of a random walk. The current state of the motor can thus be inferred
from the methylation patterns on the cytosol side of the TAR complex. The general
aspartate pathway thus consists of a single freely diffusing molecule CheYp, a protein
complex which is only concerned with the external stimulus and a second protein
complex which handles the behavioral response.
Next, consider how a cell uses a signal for growth called the platelet derived
growth factor or PDGF. This signal must be noticed by a cell before it can be used.
A single PDGF receptor diffuses freely in the cellular membrane. When it encounters
a PDGF extracellularly, it combines with another receptor to form what is called a
dimer or double PDGF receptor. The paired receptors have tails that extend through
the membrane into the cytosol. Once paired, multiple sites on these tails phosphory-
104 6 Abstracting Principles of Computation

late. This phosphorylation triggers the assembly of six signaling complexes on the
tails. Some fraction of the six signaling complexes are then activated. The activated
complexes then initiate broadcast signals using multiple routes to many destinations
inside the cell. These signals activate and coordinate the many biochemical changes
needed in cell growth. As usual, the response is removed by phosphatases which take
away phosphates from activated complexes. This is, therefore, a cellular strategy to
make a large number of receptors catch as many of the PDGF molecules as possible.
Let’s look at the differences in the handling of the extracellular signal in the TAR
and PDGF cases. PDGF requires a sequence of reactions: receptor clustering into
dimers, phosphorylation of critical sites and the binding of multiple signaling mole-
cule complexes. Hence, the response here is slower than the conformational change
that occurs within a preformed TAR complex due to the aspartate signal. Notice
though that the TAR response needs to be quick as it alters swimming patterns for
finding food. There is not a need for such speed in the PDGF response. Also, not
every PDGF receptor carries a signaling complex as only those that encounter a
PDGF molecule undergo the changes that result in the addition of signaling com-
plexes. Thus, when the cell makes a large number of receptors and casts a wide net
to catch PGDF signals, there is not a large cost. This is because there is no need
to make expensive signaling complexes on all receptors—only the ones which need
them.
TAR complex however must react quickly. In fact, there is even a real possibility
that all TAR receptors could be occupied at the same time. Since the bacteria must
respond to an aspartate concentration gradient on the order of 105 , the response to
90 and 95 % TAR receptor occupancy must be different. Hence, all receptors must
carry the signaling complexes necessary to alter the flagella motor.
The PGDF receptor complex produces multiple outputs which implies the initial
extracellular signal is divergent with effects spreading throughout the cell. The TAR
complex has only two outputs: CheYp and CheBp which both effect motor response.
However, there are also other pathways that effect CheYp and CheBp. Hence, there
are multiple controls here that converge on a single output pair.
In general, the concentration of proteins in the cytosol is high enough that there
is competition among macromolecules for water. This effects the diffusion rate of
the macromolecules and changes the tendency of proteins to associate with other
proteins. Hence, the formation of a signaling complex requires a sequence of binding
steps in which proteins come together in a diffusion limited interaction governed by
mass action. This implies that association among macromolecules is strongly favored
in these crowded conditions. Now, the TAR complex serves a focused purpose and
drives changes in swimming action directly and quickly. Signaling complexes as
seen in the PDGF receptor are favored for different reasons. Clusters of molecules
in permanent association with each other are well suited to perform reactions that
integrate, codify and transduce extracellular signals. These solid state computations
are more rapid, efficient and noise free than a system of diffusing molecules in which
encounters are subject to random fluctuations of thermal energy. Also, complexes
are made only once. So even though the construction process uses diffusion, once
completed, the complex can repeatedly and accurately perform the same task.
6.7 Modulation Pathways 105

A macromolecule can switch its conformational state 104 –106 times a second at
the cost of 1 ATP per switch. So we might infer from this that biological information
has a cost of 1 ATP per bit. However, in a real system the major expense is not
in registering conformational change. The true cost is the amount of energy used to
distribute, integrate and process the information of the extracellular signal within and
between systems. Hence, if a switch in conformational state gives a rough indication
of information storage rate, we might think the biological information storage rate is
about 104 –106 bits per second. However, based on the distribution, integration and
processing costs, it is more like 103 –105 per second at a cost of 106 ATP per bit.
Hence, a guiding principle is that the second messenger triggers will store infor-
mation at a rate of about 103 per second at a cost of 106 energy units per bit.
Although signaling complexes are indeed powerful tools, there is also a strong
need for diffusion based mechanisms. Most extracellular stimuli are not passed in
a linear chain of cause and effect from one receptor in the membrane to one target
molecule in the nucleus or cytosol. If such a linear chain could be used, the signal
could be passed very efficiently as a chain of conformational changes along a protein
filament. In fact, signals such as we see in the axon of an excitable neuron are essen-
tially passed along in this manner. However, most extracellular stimuli are spread as
a rapidly diverging influence from the receptor to multiple chemical targets in the
cell at often many different spatial locations. This divergence allows the possibility
of signal amplification and cross-talk between different signal processes chains. The
machinery of the signal spread in this way always requires at least one molecule
that diffuses as a free element. Note that always associating one receptor with one
effector (as in the CheYp to flagella motor effector) is faster and more efficient than
the general diffusion mechanism. However, it is a much smaller and more localized
effect!
The TAR receptor approach does use a freely diffusing molecule, CHeYp, but the
action of this pathway is restricted to the flagella motor. Hence there is no divergence
of action. In the PDGF receptor approach, there is a large cost associated with
the construction of the signaling complex, but once built, diffusion is used as the
mechanism to send the PDGF signal in a divergent manner across the cell. Hence,
one extracellular event triggers one change or one extracellular event triggers 103 –
105 changes! This is a large increase in the sensitivity of the cell to the extracellular
input. We see a cell pays a price in time and consumption of ATP molecules for a
huge reward in sensitivity.

References

D. Bray, Signalling complexes: biophysical constraints on intracellular communication. Annu. Rev.


Biophys. Biomol. Struct. 27, 59–75 (1998)
J. Gerhart, M. Kirschner, Cells, Embryos and Evolution: Towards a Cellular and Developmental
Understanding of Phenotypic Variation and Evolutionary Adaptability (Blackwell Science, New
Jersey, 1997)
B. Hille, Ionic Channels of Excitable Membranes (Sinauer Associates Inc, Sunderland, 1992)
Chapter 7
Second Messenger Diffusion Pathways

Now that we have discussed some of the extracellular trigger mechanisms used in
bioinformation processing, it is time to look at the fundamental process of diffusion
within a cell. Our second messenger systems often involve Ca ++ ion movement
in and out of the cell. The amount of free Ca ++ ion in the cell is controlled by
complicated mechanisms, but some is stored in buffer complexes. The release of
calcium ion from these buffers plays a big role in cellular regulatory processes and
which protein P(T1 ) is actually created from a trigger T0 . Following discussions in
Berridge (1997), Wagner and Keizer (1994) and Höffer et al. (2001), we will now
describe a general model of calcium ion movement and buffering. These ideas will
give us additional insight into how to model second messenger triggering events.

7.1 Calcium Diffusion in the Cytosol

Calcium is bound to many proteins and other molecules in the cytosol. This binding is
fast compared to calcium release and diffusion. From Wagner and Keizer (1994), we
can develop a rapid-equilibrium approximation to calcium buffering. Let’s assume
there are M different calcium binding sites; in effect, there are different calcium
species. We label these sites with the index j. Each has a binding rate constant
k +j and a disassociation rate constant k −j . We assume each of these binding sites
is homogeneously distributed throughout the cytosol with concentration B j . We let
the concentration of free binding sites for species j be denoted by b j (t, x) and
the concentration of occupied binding sites is c j (t, x); where t is our time variable
and x is the spatial variable. The units of b j are (mM B j )/liter and c j has units
(mM B j )/liter. We let u(t, x) be the concentration of free calcium ion in the cytosol
at (t, x). Hence, we have the reactions

© Springer Science+Business Media Singapore 2016 107


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_7
108 7 Second Messenger Diffusion Pathways

Ca + B j →k +j Ca/B j
Ca/B j →k −j Ca + B j

The corresponding dynamics are

d[Ca]
= −k +j [Ca][B j ] + k −j [Ca/B j ]
dt
d[Ca/B j ]
= k +j [Ca][B j ] − k −j [Ca/B j ]
dt
From this, we have the unit analysis

mM Ca mM Ca mM B j mM Ca/B j
= units of k +j × + units of k −j ×
liter − sec liter liter liter

This tells us the units of k +j are liter/(mM B j -sec) and k −j has units (mM of Ca)/
( (mM Ca/B j )-sec). Hence, k −j c j has units of (mM Ca)/(liter-sec), k +j b j , 1/s and
k +j b j u(t, x), (mM Ca)/(liter-sec). Let the diffusion constant for each site be D j . We
assume D j is independent of the bound calcium molecules; that is, it is independent
of c j . We also assume a homogeneous distribution of the total concentration B j .
This implies a local conservation law

b j (t, x) + c j (t, x) = B j

For simplicity, we will use a 1D model here; thus the spatial variable is just the
coordinate x rather than the vector (x, y, z). For the one dimensional domain of
length L, we have 0 ≤ x ≤ L. In a cell, some of the calcium ion is bound in storage
pools in the endoplasmic reticulum or ER. The amount of release/uptake could be a
nonlinear function of the concentration of calcium ion. Hence, we model this effect
with the nonlinear mapping f 0 (u). The diffusion dynamics are then

∂u
= rate of ER release and uptake
∂t
+ sum over all calcium species(amount freed from binding site − amount bound)
∂2u
+ D0
∂x 2

Thus, we find

M  
∂u − + ∂2u
= f 0 (u) + k j c j − k j (B j − c j )u + D0 (7.1)
∂t j=1
∂x 2
7.1 Calcium Diffusion in the Cytosol 109

Also, we see the dynamics for c j are given by

∂c j ∂2c j
= −k −j c j + k +j (B j − c j )u + D j (7.2)
∂t ∂x 2
where D0 is diffusion coefficient for free calcium ion. Our domain is a one dimen-
sional line segment, so there should be diffusion across the boundaries at x = 0 and
x = L for calcium but not for binding molecules. Let J0 and JL denote the flux of
calcium ion across the boundary at 0 and L. We will usually just say J0,L for short.
Our boundary conditions then become

∂u 
− D0 = J0 (7.3)
∂x 0

∂u 
−D0 = JL (7.4)
∂x  L

∂c j 
Dj =0 (7.5)
∂x 0

∂c j 
Dj =0 (7.6)
∂x  L

The concentration of total calcium is the sum of the free and the bound. We denote
this by w(t, x) and note that


M
w=u+ cj
j=1
  
∂2w 
M M
∂w ∂2c j
= f0 w − c j + D0 2 + (D j − D0 ) 2
∂t j=1
∂x j=1
∂x

Note that

∂u  ∂c j
M
∂w
= +
∂t ∂t j=1
∂t

M  
− + ∂2u
= f 0 (u) + k j c j − k j (B j − c j )u + D0 2
j=1
∂x

M    M
− + ∂2c j
+ −k j c j + k j (B j − c j )u + Dj
j=1 j=1
∂x 2

∂2u 
M
∂2c j
= f 0 (u) + D0 + D j
∂x 2 j=1
∂x 2
110 7 Second Messenger Diffusion Pathways

     ∂2c j 
M M M M
∂2u ∂2c j ∂2c j
= f0 w − c j + D0 2 + D0 − D 0 + Dj
j=1
∂x j=1
∂x 2
j=1
∂x 2
j=1
∂x 2
    2  
∂ u  ∂2c j
M M M
∂2c j
= f0 w − c j + D0 + − (D 0 − D j )
j=1
∂x 2 j=1
∂x 2 j=1
∂x 2
  
∂2w 
M M
∂2c j
= f0 w − c j + D0 2 + (D j − D0 ) 2
j=1
∂x j=1
∂x

The boundary conditions are 0 and L can then be computed using Eqs. 7.3–7.6.
     
∂w  ∂u  ∂c j  ∂c j 
M M
−D0 = −D0  − Dj − (D0 − D j ) = J0,L
∂x 0,L ∂x 0,L j=1
∂x 0,L j=1
∂x 0,L

Thus, the total calcium equation is

  
∂2w 
M M
∂w ∂2c j
= f0 w − c j + D0 2 + (D j − D0 ) 2 (7.7)
∂t j=1
∂x j=1
∂x
  
∂w  ∂c j 
M
J0,L = −D0 − (D − D ) (7.8)
∂x 0,L ∂x 0,L
0 j
j=1

7.1.1 Assumption One: Calcium Binding Is Fast

It seems reasonable to assume that calcium binding, determined by k −j and k +j B j is


fast compared to the release and uptake rate function f 0 . This is a key assumption
of Wagner and Keizer (1994). Hence, we will assume that Eq. 7.2 is at equilibrium
(that is all c j terms are no longer changing and hence all partials are zero) giving

k −j c j − k +j (B j − c j )u = 0

Solving, we find

Bju Bju
cj = = (7.9)
k −j Kj + u
k +j
+u
7.1 Calcium Diffusion in the Cytosol 111

k− 
where K j is the ratio k +j . Now, we also know u = w − M j=1 c j . Substituting this in
j
for u, we have
⎡ ⎛ ⎞⎤
M  M 
c j ⎣ K j + ⎝w − c j ⎠⎦ = B j w − cj (7.10)
j=1 j=1

This simplifies to

B j w = −c2j + (K j + w + B j )c j + (B j − 1) ck
k= j

This shows that the concentration of occupied binding sites for species j is a function
of the total calcium concentration w. Let c j (w) denote this functional dependence.
∂c ∂c
From the chain rule, we have ∂xj = ∂wj ∂w∂x
. Then, the calcium dynamics become

    M 
∂ 
M
∂w ∂2w ∂c j ∂w
= f0 w − c j (w) + D0 2 + (D j − D0 )
∂t j=1
∂x ∂x j=1 ∂w ∂x
 M   M  
∂ ∂c j ∂w
= f0 w − c j (w) + D0 + (D j − D0 )
j=1
∂x j=1
∂w ∂x

The boundary conditions then become


    
∂c j  ∂w 
M
−D0 − (D j − D0 ) = J0,L
j=1
∂w 0,L ∂x 0,L

To summarize, the dynamics for calcium, with the assumption that calcium binding
into the buffers is fast compared to release and uptake from the calcium pools in the
ER, are
 M   M  
∂w ∂ ∂c j ∂w
= f0 w − c j (w) + D0 + (D j − D0 ) (7.11)
∂t j=1
∂x j=1
∂w ∂x
   
M
∂c j  ∂w 
J0,L = −D0 − (D0 − D j ) (7.12)
j=1
∂w 0,L ∂x 0,L

Notice that if we define a new diffusion coefficient, D for the diffusion process that
governs w by


M
∂c j
D = D0 + (D j − D0 ) (7.13)
j=1
∂w
112 7 Second Messenger Diffusion Pathways

we can rewrite Eqs. 7.11 and 7.12 as


 M   
∂w ∂ ∂w
= f0 w − c j (w) + D (7.14)
∂t j=1
∂x ∂x
 
 ∂w 
−D  = J0,L (7.15)
0,L ∂x 0,L

7.1.2 Assumption Two: Binding Rate Is Much Less Than


Disassociation Rate

Many calcium binding molecules have low calcium affinity in µM concentration


range and above. Recall that k +j has units of liter/(mM B j -sec) and k −j has units of
(mM of Ca)/((mM Ca/B j )-sec). Hence, the ratio K j has units of (mM B j /(mM Ca/B j )
(mM Ca)/liter. The ratio of concentration of B j to Ca/B j can thus also be interpreted
as the (number of free binding sites/liter)/(number of bound sites/liter). Consider the
equation u < K j . This would imply k +j u < k −j . From the units discussion above,
this occurs if the number of bound sites is less than the number of free sites. The
data in Wagner and Keizer (1994) suggests that typically K j ≈ 0.2 µM per liter
calcium concentration. From Chap. 6, we know that inside the cell, the free calcium
concentration is 0.1 − 0.2 µ moles. Thus, it is reasonable to assume that the rate of
binding is much less than the rate of disassociation. This is the second key assumption
of Wagner and Keizer (1994). Hence, we further assume in our model that u  K j .
It then follows that we can simplify Eq. 7.10 to

Bj
cj = u (7.16)
Kj
 
 Bj
Thus, w = 1 + Mj=1 Kj
u and solving for c j , we find

Bj Bj
Kj
w ∂c j Kj
cj = M =⇒ = M
1+ Bk
k=1 K k
∂w 1 + k=1 Bk
Kk

Bj
Then, letting γ j = Kj
,

 M    M 
∂w γjw ∂ γj ∂w
= f0 w − M + D0 + (D j − D0 ) M
∂t j=1 1 + k=1 γk
∂x j=1 1 + k=1 γk ∂x
   
M 
γjw γj ∂2w
= f0 w − M + D0 + (D j − D0 ) M
1+ k=1 γk j=1 1+ k=1 γk ∂x 2
7.1 Calcium Diffusion in the Cytosol 113

M ∂w
Letting  denote the term 1 + k=1 γk , then w = u and so we have ∂t
=  ∂u
∂t
.
Thus,
 M   M 
∂u γ j u γj ∂2u
 = f 0 u − + D0 + (D j − D0 )  2
∂t j=1
 j=1
 ∂x
   M  2
∂ u
= f0 u − ( − 1)u + D0 + (D j − D0 )γ j
j=1
∂x 2

This can then be written as


   2
∂u f 0 (u) D0 + Mj=1 D j γ j ∂ u
= +
∂t   ∂x 2

Next, define the new release uptake function f by

f 0 (u)
f (u) = . (7.17)

We note that, in this approximation of binding is much less than disassociation, the
diffusion constant D has a new form
M M

M
γj D0 + j=1 D j γ j − ( − 1)D0 D0 + j=1 D j γ j
D → D0 + (D j − D0 ) = =
  
j=1

This suggests we define the new diffusion constant D̂ by


M
D0 + j=1 Djγj
D̂ = (7.18)

The free calcium dynamics are thus

∂u ∂2u
= f (u) + D̂ 2 (7.19)
∂t ∂x

We call the function f , the effective calcium release/uptake rate and D̂, the effective
diffusion coefficient. What about the new boundary conditions? This is just another
somewhat unpleasant calculation:
   
∂w 
M
∂c j
− D0 + (D j − D0 ) |0,L = J0,L
j=1
∂w ∂x 0,L
114 7 Second Messenger Diffusion Pathways

  
M
(D j − D0 )γ j ∂u 
− D0 +   = J0,L
j=1
 ∂x 0,L
   
∂u 
M
− D0 + (D j − D0 )γ j = J0,L
j=1
∂x 0,L
   
∂u 
M
− D0 + Djγj  = J0,L
j=1
∂x 0,L

∂u 
−D̂  = J0,L
∂x 0,L

We conclude the appropriate dynamics are then

∂u f 0 (u) ∂2u
= + D̂ 2 (7.20)
∂t  ∂x
∂u J0,L
−D̂ |0,L = (7.21)
∂x 

7.2 Transcriptional Control of Free Calcium

The critical review on the control of free calcium in cellular processing in Carafoli
et al. (2001) notes the concentration of Ca ++ in the cell is controlled by the reversible
binding of calcium ion to the buffer complexes we have been discussing. These buffer
molecules therefore act as calcium ion sensors that, in a sense, decode the information
contained in the calcium ion current injection and then pass on a decision to a target.
Many of these targets can be proteins transcribed by accessing the genome. Hence,
the P(T1 ) we have discussed in Chap. 6 could be a buffer molecule B j . The boundary
condition J0,L plays the role of our entry calcium current. Such a calcium ion input
current through the membrane could be due to membrane depolarization causing an
influx of calcium ions through the port or via ligand binding to a receptor which in
turn indirectly increases free calcium ion in the cytosol. Such mechanisms involve
the interplay between the release/uptake ER function and the storage buffers as the
previous sections have shown. This boundary current determines the u(t, x) solution
through the diffusion equations Eqs. 7.20 and 7.21. The exact nature of this solution
is determined by the receptor types, buffers and storage sites in the ER. Differences
in the period and magnitude of the calcium current u(t, x) resulting from the input
J0,L trigger different second messenger pathways. Hence, there are many possible
outcomes due to a given input current J0,L .
Let’s do a back of the envelope calculation to see what might happen if a trigger
event T0 initiated an increase in a buffer B j which then initiates a complex trigger
mechanism culminating in a protein transcription. Let’s assume that the buffer B j0 is
increased to B j0 + . It is reasonable to assume that both ki+ and ki− are independent
7.2 Transcriptional Control of Free Calcium 115

of the amount of Bi that is present. Hence, we can assume all the γi ’s are unaffected
by the change in B j0 . If binding activity goes up, we would also expect that the
release/uptake activity would also change. There are then changes in the terms that
define the diffusion constant

 M
Bk Bj + 
 → 1+ + 0
k= j
Kk K j0
0

f (u) → f (u) + η
B + 
D0 + D j0 Kj0 j + kM= j0 Dk γk
D̂ → 
0
B +
1 + kM= j0 KBkk + Kj0 j

M
for some nonzero η and positive . Letting  = 1 + k=1 Dk γk , ξ j = 1
Kj
, we find

M D j0 M D j0
D0 + k=1 Dk γk +  K j D0 + k=1 Dk γk +  K j (D0 − 1) +  + D j0 ξ j0
D̂ new = M
0
= 0
=
1+ 1  +  K1  + ξ j0
k=1 γ j +  K j j0
0

D0 +−1
Thus, noting D̂ = 
, we find

D̂ = D̂ new − D̂
(D0 − 1) +  + D j0 ξ j0 (D0 − 1) + 
= −
 + ξ j0 
[(D0 − 1) +  + D j0 ξ j0 ] − [(D0 − 1) + ][ + ξ j0 ]
=
[ + ξ j0 ][]
 
D j0  − (D0 − 1 + ) D j0  − D 1 D
= ξ j0 = ξ j0 = ξ j0 D j0 1−
[ + ξ j0 ] [ + ξ j0 ]  + ξ j0 D j0

We see that
 
 D 
D̂ = ξ j0 D j0 1 − ∝
 + ξ j0 D j0  + ξ j0

Hence, to first order



D̂ ∝

and we see that the new diffusion dynamics are on the order of
 
∂u f (u) + η  ∂2u
= + D +C
∂t   ∂x 2
116 7 Second Messenger Diffusion Pathways

for some constant C. The alteration of the release/uptake function and the diffusion
constant imply a change in the solution. This change in the solution u(t, x) then
can initiate further second messenger changes culminating in altered P(T1 ) protein
production.

References

M. Berridge, Elementary and global aspects of calcium signalling. J. Physiol. 499, 291–306 (1997)
E. Carafoli, L. Santella, D. Branca, M. Brini, Generation, control and processing of cellular calcium
signals. Crit. Rev. Biochem. Mol. Biol. 36(2), 107–260 (2001)
T. Höffer, A. Politi, R. Heinrich, Intracellular ca+2 wave propagation through gap - junctional ca+2
diffusion: A theoretical study. Biophys. J. 80(1), 75–87 (2001)
J. Wagner, J. Keizer, Effects of rapid buffers on ca+2 diffusion and ca+2 oscillations. Biophys. J.
67, 447–456 (1994)
Chapter 8
Second Messenger Models

In Sect. 6.7, we discussed some of the basic features of the pathways used by
extracellular triggers. We now look at these again but very abstractly as we want
to design principles by which we can model second messenger effects. Let T0 denote
a second messenger trigger which moves though a port P to create a new trigger T1
some of which binds to B1 . A schematic of this is shown in Fig. 8.1. In the figure, r
is a number between 0 and 1 which represents the fraction of the trigger T1 which is
free in the cytosol. Hence, 100r % of T1 is free and 100(1 − r) is bound to B1 cre-
ating a storage complex B1 /T1 . For our simple model, we assume rT1 is transported
to the nuclear membrane where some of it binds to the enzyme E1 . Let s in (0, 1)
denote the fraction of rT1 that binds to E1 . We illustrate this in Fig. 8.2. We denote
the complex formed by the binding of E1 and T1 by E1 /T1 . From Fig. 8.2, we see that
the proportion of T1 that binds to the genome (DNA) and initiates protein creation
P(T1 ) is thus srT1 .

8.1 Generic Second Messenger Triggers

The protein created, P(T1 ), could be many things. Here, let us assume that P(T1 ) is
a sodium, Na+ , gate. Thus, our high level model is

sE1 /rT1 + DNA → Na+ gate

We therefore increase the concentration of Na+ gates, [Na+ ] thereby creating an


increases in the sodium conductance, gNa . The standard Hodgkin–Huxley conduc-
tance model (details are in Peterson (2015)) is given by
p q
gNa (t, V ) = gNa
max
MNa (t, v)HNa (t, V )

where t is time and V is membrane voltage. The variables MNa and HNa are the
activation and inactivation functions for the sodium gate with p and q appropriate
© Springer Science+Business Media Singapore 2016 117
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_8
118 8 Second Messenger Models

Fig. 8.1 Second messenger trigger

Fig. 8.2 Some T1 binds to the genome

positive powers. Finally, G max


Na is the maximum conductance possible. These models
generate MNa and HNa values in the range (0, 1) and hence,

0 ≤ gNa (t, V ) ≤ gNa


max

We can model increases in sodium conductances as increases in gNa max


with efficiency e,
where e is a number between 0 and 1. We will not assume all of the sE1 /rT1 + DNA to
sodium gate reaction is completed. It follows that e is similar to a Michaelson–Mentin
kinetics constant. We could also alter activation, MNa , and/or inactivation, HNa , as
functions of voltage, V in addition to the change in the maximum conductance.
However, we are interested in a simple model at present. Our full schematic is then
given in Fig. 8.3. We can model the choice process, rT1 or (1 − r)B1 /T1 via a simple
sigmoid,
  
x − x0
f (x) = 0.5 1 + tanh
g

where the transition rate at x0 is f  (x0 ) = 2g


1
. Hence, the “gain” of the transition
can be adjusted by changing the value of g. We assume g is positive. This func-
tion can be interpreted as switching from of “low” state 0 to a high state 1 at
1
speed 2g . Now the function h = rf provides an output in (r, ∞). If x is larger than the
8.1 Generic Second Messenger Triggers 119

Fig. 8.3 Maximum sodium conductance control pathway

threshold x0 , h rapidly transitions to a high state r. On the other hand, if x is below


threshold, the output remains near the low state 0.
We assume the trigger T0 does not activate the port P unless its concentrations
is past some threshold [T0 ]b where [T0 ]b denotes the base concentration. Hence, we
can model the port activity by
  
r [T0 ] − [T0 ]b
hp ([T0 ]) = 1 + tanh
2 gp

where the two shaping parameters gp (transition rate) and [T0 ]b (threshold) must be
chosen. We can thus model the schematic of Fig. 8.1 as hp ([T0 ]) [T1 ]n where [T1 ]n is
the nominal concentration of the induced trigger T1 . In a similar way, we let
  
s x − x0
he (x) = 1 + tanh
2 ge

Thus, for x = hp ([T0 ]) [T1 ]n , we have he is a switch from 0 to s. Note that 0 ≤ x ≤


r[T1 ]n and so if hp ([T0 ]) [T1 ]n is close to r[T1 ]n , he is approximately s. Further, if
hp ([T0 ]) [T1 ]n is small, we will have he is close to 0. This suggests a threshold value
r[T1 ]n
for he of 2
. We conclude
120 8 Second Messenger Models
  
r[T1 ]n
  s hp ([T0 ])[T1 ]n −
he hp ([T0 ] [T1 ]n ) = 1 + tanh 2
2 ge

which lies in [0, s). This is the amount of activated T1 which reaches the genome to
create the target protein P(T1 ). It follows then that
 
[P(T1 )] = he hp ([T0 ]) [T1 ]n [T1 ]n

The protein is created with efficiency e and so we model the conversion of [P(T1 )]
into a change in gNa
max
as follows. Let
  
e x − x0
hNa (x) = 1 + tanh
2 gNa

which has output in [0, e). Here, we want to limit how large a change we can achieve in
gNa
max
. Hence, we assume there is an upper limit which is given by  gNa max
= δNa gNamax
.
Thus, we limit the change in the maximum sodium conductance to some percentage
of its baseline value. It follows that hNa (x) is about δNa if x is sufficiently larges and
small otherwise. This suggests that x should be [P(T1 )] and since translation to P(T1 )
occurs no matter how low [T1 ] is, we can use a switch point value of x0 = 0. We
conclude
  
e [P(T1 )]
hNa ([P(T1 ]) = δNa gNa 1 + tanh
max
(8.1)
2 gNa

Our model of the change in maximum sodium conductance is therefore  gNa max
=
hNa ([P(T1 )]). We can thus alter the action potential via a second messenger trigger
by allowing
p q
gNa (t, V ) = ( gNa
max
+ hNa ([P(T1 )]))MNa (t, V )HNa (t, V )

for appropriate values of p and q within a standard Hodgkin–Huxley model.


Next, if we assume a modulatory agent acts as a trigger T0 as described above,
we can generate action potential pulses using the standard Hodgkin–Huxley model
for a large variety of critical sodium trigger shaping parameters. We label these with
a Na to indicate their dependence on the sodium second messenger trigger.
 
r Na , [T0 ]b Na , gpNa , sNa , geNa , eNa , gNa , δNa

We can follow the procedure outlined in this section for a variety of triggers. We
therefore can add a potassium gate trigger with shaping parameters
 K 
r , [T0 ]K b , gpK , sK , geK , eK , gK , δK
8.1 Generic Second Messenger Triggers 121

8.1.1 Concatenated Sigmoid Transitions

In the previous section, we found how to handle alterations in gNa


max
due to a trigger
T0 . We have
p q
gNa (t, V ) = ( gNa
max
+ hNa ([P(T1 )]))MNa (t, V )HNa (t, V )

where
  
e [P(T1 )]
hNa ([P(T1 )]) = δNa gNa 1 + tanh
max
2 gNa

with e and δNa in (0, 1). Using the usual transition function σ(x, x0 , g0 ), we can then
write the sodium conductance modification equation more compactly as

hNa ([P(T1 )]) = eδNa gNa


max
σ([P(T1 )], 0, gNa ).

Using this same notation, we see

hp ([T − 0]) = rσ([T0 ], [T0 ]b , gp )


 
r[T1 ]n
he (hp ([T0 ])[T1 ]n ) = sσ hp ([T0 ])[T1 ]n , , ge
2
 
r[T1 ]n
= sσ rσ([T0 ], [T0 ]b , gp )[T1 ]n , , ge
2

Note the concatenation of the sigmoidal processing. Now [P(T1 )] = he hp ([T0 ])
[T1 ]n ) [T1 ]n Thus,
 
r[T1 ]n
[P(T1 )] = sσ rσ([T0 ], [T0 ]b , gp )[T1 ]n , , ge [T1 ]n .
2

Finally,
p q
gNa (t, V ) = gNa
max
( 1 + eδNa σ([P(T1 )], 0, gNa ) )MNa (t, V )HNa (t, V )

Implicit is this formula is the cascade “σ(σ(σ” as σ([P(T1 )], 0, gNa ) uses two con-
catenated sigmoid calculations itself. We label this as a Sigma Three Transition, σ3 ,
and use the notation
122 8 Second Messenger Models

σ3 ([T0 ], [T0 ]b , gp ; inner most sigmoid r; scale innermost calculation by r


[T1 ]n ; scale again by [T1 ]n this is input to next sigmoid
r[T1 ]n
2 , ge ; offset and gain of next sigmoid s; scale results by s
[T1 ]n ; scale again by [T1 ]n this is [P(T1 )]
this is input into last sigmoid
0, gNa ; offset and gain of last sigmoid)

Thus, the gNa computation can be written as


  
r[T1 ]n
gNa (t, V ) = gNa
max
1 + eδNa h3 [T0 ], [T0 ]b , gp ; r;[T1 ]n ; , ge ; s;[T1 ]n ; 0, gNa
2
p q
MNa (t, V )HNa (t, V )

This implies a trigger T0 has associated with it a data vector



r[T1 ]n
WT0 = [T0 ], [T0 ]b , gp , r, [T1 ]n , , ge , s, [T1 ]n , 0, gT
2

where gT denotes the final gain associated with the third level sigmoidal transition
to create the final gate product. We can then rewrite our modulation equation as
p q
gNa (t, V ) = gNa
max
(1 + eδNa h3 (WNa )) MNa (t, V )HNa (t, V )

8.2 A Graphic Model Computation Model

Although third order sigmoidal transformations certainly occur in our models,


hiding all the details obscures what is really going on. We will now recast the model
into computational graph structure. This makes it easier to see how the calculations
will be performed as asynchronous agents. Consider the typical standard sigmoid
transformation
  
r [T0 ] − [T0 ]b
hp ([T0 ]) = 1 + tanh
2 gp

We can draw this as a graph as is shown in Fig. 8.4 where h denotes the standard
sigmoidal state transition function. We also have
  
r[T1 ]n
s hp ([T0 ])[T1 ]n −
he (hp ([T0 ]) [T1 ]n ) = 1 + tanh 2
2 ge
8.2 A Graphic Model Computation Model 123

Fig. 8.4 The first level sigmoid graph computation

Fig. 8.5 Sigmoid graph computations. a Second level. b Third level

which for a given output Y becomes


  
s Y − r[T21 ]n
he (Y ) = 1 + tanh
2 ge

and the computation can be represented graphically by Fig. 8.5a. Finally, we have
used
[P(T1 )] = he (hp ([T0 ])[T1 ]n )[T1 ]n

hNa ([P(T1 )]) = eδNa gNa


max
h([P(T1 )], 0, gNa )

which can be shown diagrammatically as in Fig. 8.5b. These graphs are, of course,
becoming increasingly complicated. However, they are quite useful in depicting how
feedback pathways can be added to our computations. Let’s add feedback pathways
124 8 Second Messenger Models

Fig. 8.6 Adding feedback to the maximum sodium conductance control pathway

to the original maximum conductance control pathway we illustrated in Fig. 8.3.


Figure 8.6 is equivalent to that of the computational graph of Fig. 8.5b but shows
more of the underlying cellular mechanisms. The feedback pathways are indicated
by the variables ξ0 and ξ1 . The feedback ξ0 is meant to suggest that some of the
protein kinase T1 which is bound onto (1 − r)B1 /T1 is recycled (probably due to
other governing cycles) back to free T1 . We show this with a line drawn back to the
port P. Similarly, some of the protein kinase not used for binding to the enzyme E1 ,
to start the protein creation process culminating in P(T1 ), is allowed to be freed for
reuse. This is shown with the line labeled with the legend 1 − ξ1 leading back to
the nuclear membrane. It is clear there are many other feedback possibilities. Also
the identity of the protein P(T1 ) is fluid. We have already discussed the cases where
P(T1 ) can be voltage dependent gates for sodium and potassium and calcium second
messenger proteins. However, there are other possibilities:
• B1 : thereby increasing T1 binding
• T1 : thereby increasing rT1 and P(T1 )
• E1 : thereby increasing P(T1 ).
and so forth.
8.3 Ca++ Triggers 125

8.3 Ca++ Triggers

We now specialize to the case of the Ca++ triggers. Then [T0 ] represents [Ca++ ]
and [T1 ] will denote some sort of Ca++ binding complex. We will use the generic
name calmodulin for this complex and use the symbol C ∗ as its symbol. Hence,
[T1 ] = [C ∗ ] with [C ∗ ]n the nominal calcineuron concentration. Thus,


r ++ [C ∗ ]n Ca++ Ca++
WCa++ = [Ca++ ], [Ca++ ]b , gCa++ , rCa++ , [C ∗ ]n , Ca , ge ,s , [C ∗ ]n , 0, gCa++
2

Further, the sodium Ca++ conductance model is then


++ p q
gNa (t, V ) = gNa
max
(1 + eCa++ δNa
Ca
h3 (WCa++ )) MNa (t, V )HNa (t, V )
++
where the term δNa
Ca
is the fraction of the sodium maximum conductance that the
++
Ca second messenger pathway can effect. We know the situation is actually quite
complicated.
• [Ca++ ] is an injected current—i.e. a current pulse—whose temporal and spatial
shape are very important to the protein production P(C ∗ ).
• C ∗ is a Ca++ binding complex which serves as an intermediary; i.e. as our T1
protein. It will eventually be translated into the protein P(C ∗ ) via mechanisms
similar to what we have described. There is a lot going on here and we must
remember that we are trying for a simple model which will capture some of the
core ideas behind second messenger Ca++ signaling systems.
• We know [Ca++ ] is determined internally by a complicated diffusion model which
has a damped time and spatial solution.
To enable our model to handle more of the special characteristics of the Ca++ trig-
gers, we must let the time independent Ca++ data vector from WCa++ become time
dependent:
⎡ ⎤
[Ca++ ](t)
⎢ [Ca++ ]b (t) ⎥
⎢ ⎥
⎢ gCa++ ⎥
⎢ ⎥
⎢ rCa++ ⎥
⎢ ⎥
⎢ [C ∗ ]n exp( −t ) ⎥
⎢ τ ⎥
⎢ rCa++ [C ∗ ]n
c
−t ⎥
WCa++ (t) = ⎢ exp( )
τc ⎥
⎢ Ca++ 2 ⎥
⎢ ge ⎥
⎢ Ca++ ⎥
⎢s ⎥
⎢ ∗ ⎥
⎢ [C ]n ⎥
⎢ ⎥
⎣0 ⎦
gCa++
126 8 Second Messenger Models

This gives the new model


++ p q
gNa (t, V ) = gNa
max
(1 + eCa++ δNa
Ca
h3 (WCa++ (t) )) MNa (t, V )HNa (t, V )

8.4 Spatially Dependent Calcium Triggers

Using the standard Ball and Stick model, the calcium injection current can be inserted
at any electrotonic distance x0 with 0 ≤ x0 ≤ 4. For convenience, we usually will
only allow x0 to be integer valued. The previous trigger equations without spatial
dependence are given below:
  
++ r [Ca++ (t)] − [Ca++ ]b
hp ([Ca (t)]) = 1 + tanh
2 gp
  ∗

++ ∗ s hp ([Ca++ (t)])[C ∗ ]n − r[C2 ]n
he (hp ([Ca (t)]) [C ]n ) = 1 + tanh
2 ge
 
[P([C ∗ ](t)] = he hp ([Ca++ (t)]) [C ∗ ]n [C ∗ ]n
  
e [P(C ∗ (t))]
hNa ([P([C ∗ ](t))]) = δNa gNa max
1 + tanh
2 gNa

where for convenience of discussion, we have refrained from labeling all of the rel-
evant variables with Ca++ tags. Now given the injection current [Ca++ (t)], only a
portion of it is used to form the C ∗ complex. The initial fraction is hp ([Ca++ (t)])[C ∗ ]n
and it begins the diffusion across the cellular cytosol. Hence, following the discus-
sions in Chap. 7, if the injection is at site x0 , the solution to the diffusion equation
for this initial pulse is given by
 
hp ([Ca++ (t)])[C ∗ ]n −(x − x0 )2
I(t, x) = √ exp
t 4Dt

Then the amount gNa max


changes depends on the amount of the above response that
gets to the nuclear membrane. This is given by
  
r[C ∗ ]n
s I(t, x) −
he (I(t, x)) = 1 + tanh 2
2 ge

Note that I(t, x) explicitly contains time and spatial attenuation. Combining, we see
the full spatial dependent equations are:
8.4 Spatially Dependent Calcium Triggers 127
  
r [Ca++ (t)] − [Ca++ ]b
hp ([Ca++ (t)]) = 1 + tanh
2 gp
++ ∗  
hp ([Ca (t)])[C ]n −(x − x0 )2
I(t, x)) = √ exp
t 4Dt
  
r[C ∗ ]n
s I(t, x) − 2
he (I(t, x)) = 1 + tanh
2 ge
[P([C ∗ ](t)) = he (hp ([Ca++ ](t)) [C ∗ ]n )[C ∗ ]n
  
∗ e [P(C ∗ (t))]
hNa ([P(C (t))]) = δNa gNa 1 + tanh
max
2 gNa
+ hNa ([P(C ∗ (t)0] )MNa (t, V )HNa (t, V )
p q
gNa (t, V ) = (gNa
max

Now consider the actual feedback paths shown in Fig. 8.6 for a Ca++ trigger. In a
calcium second messenger event, the calcium current enters the cell through the port
and some remains free and some is bound into the calcineuron complex. In this
context, we interpret the feedback as follows: 1 − ξ0 of the bound [Ca∗ ](t) is fed
back to the port as free Ca++ . This happens because the bound complex disassociates
back into Ca++ plus other components. This amount is fed back into the input of hp
and the feedback response J(t) is given by


J(t) = hp ((1 − ξ0 )j [Ca++ (t)])
j=0

∞   
r (1 − ξ0 )j ([Ca++ (t)] − [Ca++ ]b )
= 1 + tanh
j=0
2 gp

To estimate this value, note for u = 1 − ξ0 , u is in (0, 1). It follows from the mean
value theorem that for any positive integer j, we have for some point c between x
and x0
  j   j 
r u (x − x0 ) r uj u (cx − x0 )
1 + tanh = sech2 (x − x0 )
2 g 2g g

Thus, since cx is bounded by x, we have


  j   j 
r u (x − x0 ) r uj 2 u (x − x0 )
1 + tanh ≤ sech (x − x0 )
2 g 2g g

Similarly, to first order, there is a constant dx between x0 and x so that


    
r x − x0 r1 2 (dx − x0 )
1 + tanh = sech (x − x0 )
2 g 2g g
128 8 Second Messenger Models

It follows then that


  j   j 
r
2
1 + tanh u (x−x
g
0)
uj sech2 u (cxg−x0 )
   ≈  
r
2
1 + tanh x−x
g
0
sech2 (dx −x
g
0)

 
uj sech2 (cx −x
g
0)

≈  
sech2 x g 0 )
(d −x

 
(c −x )
uj sech2 x g 0
Thus, to first order, letting γx = 
(d −x )
 , we have
sech2 x g 0

  j    
r u (x − x0 ) j r x − x0
1 + tanh ≈ γx u 1 + tanh
2 g 2 g

Applying this to our sum J we find It then follows, using γ(t) for the term γCa++ (t),


J(t) ≈ γx hp ([Ca++ (t)]) (1 − ξ0 )j
j=0
1
= γ(t) hp ([Ca++ (t)])
ξ0

Thus, the multiplier applied to the gNa computation due to feedback is of order ξ10 ;
i.e. J ≤ γx ξ10 The constant γx is difficult to assess even though we have first order
bounds for it. It is approximately given by
 
r (1 − ξ0 )([Ca++ (0)] − [Ca++ ]b )
γ(t) ≈ sech2 ([Ca++ (t)] − [Ca++ ]b ).
2g g

8.5 Calcium Second Messenger Pathways

From the discussions so far, we can infer that the calcium second messenger pathways
can influence the shape of the action potential in many ways. For our purposes, we
will concentrate on just a fraction of these possibilities. If a calcium current is injected
into the dendrite at time and spatial position (t0 , x0 ), it creates a protein response from
the nucleus following the equations below for various ions. We use the label a to
denote the fact that these equations are for protein P(C ∗ ) = a. The calcium injection
current is also labeled with the protein a as it is known that calcium currents are
specifically targeted towards various protein outputs.
8.5 Calcium Second Messenger Pathways 129
  
ra [Caa,++ (t)] − [Caa,++ ]b
hpa ([Caa,++ (t)]) = 1 + tanh
2 gpa
 
hpa ([Caa,++ (t)])[C a,∗ ]n −(x − x0 )2
I a (t, x)) = √ exp
t − t0 4Da (t − t0 )
  
I a (t, x) − r [C2 ]n
a a,∗
sa
he (I(t, x)) =
a
1 + tanh
2 gea
[Pa ([C a,∗ ](t))] = hea (hpa ([Ca++ (t)]) [C a,∗ ]n )[C a,∗ ]n
  a a,∗ 
ea [P (C (t))]
h ([P ([C ](t)]) =
a a a,∗
δa ga max
1 + tanh
2 ga
ga (t, V ) = (ga + ha ([P ( [C ](t)))Mpa (t, V )Haq (t, V )
max a a,∗

Thus, we have

+ hNa ([PNa ([C ∗ ](t)]))αNa (t − t0 , V )βNa (t − t0 , V )


p q
gNa (t, V ) = (gNa
max

gK (t, V ) = (gKmax + hK ([PK ([C ∗ ](t))]))Mpa (t − t0 , V )Haq (t − t0 , V )

implying that the calcium second messenger activity alters the maximum ion con-
ductances as follows:
   
 −(t − t0 )
δgNa (t0 , x0 ) = hNa PNa ( C Na,∗ n exp
τcNa
   
 −(t − t0 )
δgK (t0 , x0 ) = hK PK ( C K,∗ n exp
τcK

We see that [Ca++ ] currents that enter the dendrite initiate cellular changes that effect
potassium conductance and hence the hyperpolarization portion of the action poten-
tial curve. Also, these currents can effect the sodium conductance which modifies the
depolarization portion of the action potential. Further, these changes in the maximum
sodium and potassium conductances also effect the generation of the action potential
itself. If feedback is introduced, we know there is a multiplier effect for each ion a:

1
J a (t) = γ a (t) hpa ([Ca++ (t)])
ξ0a

which alters the influence equations by adding a multiplier at the first step:
  
1 ra [Caa,++ (t)] − [Caa,++ ]b
hpa ([Caa,++ (t)]) = γ a (t) a 1 + tanh
ξ0 2 gpa
 
hpa ([Caa,++ (t)])[C a,∗ ]n −(x − x0 )2
I (t, x)) =
a
√ exp
t − t0 4Da (t − t0 )
130 8 Second Messenger Models
  
r a [C a,∗ ]n
sa I a (t, x) −
hea (I(t, x)) = 1 + tanh 2
2 gea
[Pa ([C a,∗ ](t)] = hea (hpa ([Ca++ (t)]) [C a,∗ ]n )[C a,∗ ]n
  a a,∗ 
ea [P (C (t))]
h ([P ([C ](t)]) =
a a a,∗
δa ga max
1 + tanh
2 ga
ga (t, V ) = (ga + ha ([P ([C ](t))]) )Mpa (t, V )Haq (t, V )
max a a,∗

The full effect of the multiplier is nonlinear as it is fed forward through a series
of sigmoid transformations. There are only five integer values for the electrotonic
distance which can be used for the injection sites. These are x0 = 0 (right at the soma)
to x0 = 4 (farthest from the soma). At each of these sites, calcium injection currents
can be applied at various times. Hence given a specific time t0 , the total effect on the
sodium and potassium conductances is given by


4 
4
δgNa (t0 ) = δgNa (t0 , i), δgK (t0 , x0 ) = δgK (t0 , i)
i=0 i=0

 
−(x−i)2
where the attenuation due to the terms exp 4Da (t−t0 )
for the ions a is built into the
calculations.

8.6 General Pharmacological Inputs

When two cells interact via a synaptic interface, the electrical signal in the pre-
synaptic cell in some circumstances triggers a release of a neurotransmitter (NT)
from the pre-synapse which crosses the synaptic cleft and then by docking to a
port on the post cell, initiates a post-synaptic cellular response. The general pre-
synaptic mechanism consists of several key elements: one, NT synthesis machinery
so the NT can be made locally; two, receptors for NT uptake and regulation; three,
enzymes that package the NT into vesicles in the pre-synapse membrane for delivery
to the cleft. There are two general pre-synaptic types: monoamine and peptide. In the
monoamine case, all three elements for the pre-cell response are first manufactured in
the pre-cell using instructions contained in the pre-cell’s genome and shipped to the
pre-synapse. Hence, the monoamine pre-synapse does not require further instructions
from the pre-cell genome and response is therefore fast. The peptide pre-synapse
can only manufacture a peptide neurotransmitter in the pre-cell genome; if a peptide
neurotransmitter is needed, there is a lag in response time. Also, in the peptide case,
there is no re-uptake pump so peptide NT can’t be reused.
On the post-synaptic side, the fast response is triggered when the bound NT/
Receptor complex initiates an immediate change in ion flux through the gate thereby
altering the electrical response of the post cell membrane and hence, ultimately its
8.6 General Pharmacological Inputs 131

action potential and spike train pattern. Examples are glutumate (excitatory) and
GABA (inhibitory) neurotransmitters. The slow response occurs when the initiating
NT triggers a second messenger response in the interior of the cell. There are two
general families of receptors we are interested in: family 1:7 transmembrane regions
and family 2:4 transmembrane regions. The responses mediated by these families
are critical to the design of a proper abstract cell capable of interesting biological
activity.

8.6.1 7 Transmembrane Regions

The 7 transmembrane regions are arranged in a circle in the membrane with a central
core. The first messenger, FM, docks with the receptor, R7 , creating a complex,
NT /R7 . The complex undergoes a conformational change allowing it to bind to
a class of proteins (G-proteins) creating a new complex, NT /R7 /G. This causes
a conformational change in the G protein allowing it to bind to an intracellular
enzyme E inducing another conformational change on E. At this point, we have a
final complex, NT /R7 /G/E, in which the E has been activated so that it can release
the substance we call the second messenger, SM. The substance SM can then initiate
many intracellular processes by triggering a cascade of reactions that culminate in
the transcription of a protein from the genome in the post-cell nucleus. A typical
pathway has SM activating an intracellular enzyme E2 which creates a transcription
factor, TF. The transcription factor TF crosses into the nucleus to transcribe a protein
P. The protein P may then bind to other genes in the post genome to initiate an entire
cascade of gene activity, the protein P may transport out of the post nucleus as the
voltage activated gate protein which is inserted into the post cell membrane which
effectively increases the maximum ion conductance for some ion or many other
possibilities. These possibilities can be modeled with our abstract feature vector
approach discussed in Chap. 9 where we associate a ten dimensional vector with the
axonal output of the post neuron.
• The protein P transports out of the post nucleus as the voltage activated gate
protein which is inserted into the post cell membrane which effectively increases
the maximum ion conductance for some ion. We model this effect by alterations
in the ten feature vector parameters.
• The protein P operates in the synaptic cleft increasing or decreasing neurotransmit-
ter uptake and neurotransmitter creation. These effects are somewhat intertwined
with the first possibility sketched out above, but these have an intrinsic time delay
before they take effect.
The docking of the FM therefore initiates a complete repertoire of response that can
completely reshape the biological structure of the post-cell and even surrounding
cells. There are several time periods involved here: the protein P may be built and
132 8 Second Messenger Models

transported to its new site for use in minutes (6 × 104 ms), hours (3.6 × 105 ms),
days (8.6 × 106 ms) or weeks (6 × 107 ms). Note the rough times 10 magnitude
increases here.

8.6.2 4 Transmembrane Regions

In this family, 4 transmembrane regions are arranged in a circle and five copies of
this circle are organized to form a pore in the membrane. The central pore of these
channels can be opened to increase or closed to decrease substance flux by two
means: one, the gate is voltage activated (the standard Hodgkin–Huxley ion gate)
and two, the gate is ligand activated which means substances bind to specialized
regions of the gate to cause alterations in the flux of substances through the channel.
The substance going through the channel could be a sodium ion or a first messenger.
The ligand gated channels are very interesting as they allow very complicated control
possibilities to emerge. Each 4 transmembrane region circle can have its own specific
regulatory domains and hence can be denoted as a type α circle, C4α . The full 5 circle
ion channel is thus a concatenation of 5 such circles and can be labeled C4α(1) ·
C4α(2) · C4α(3) · C4α(4) · C4α(5) or more simply as C4 (α1 α2 α3 α4 α5 ). This is much more
conveniently written as C4 (α) where the arrow over the α indicates that it has multiple
independent components. There is much complexity here since a given α(i) module
can have multiple regulatory docking sites for multiple substances and hence, the
full R4 type receptor can have a very sophisticated structure. The amount by which a
channel is open or closed is thus capable of being controlled in an exquisitely precise
fashion.

8.6.3 Family Two: The Agonist Spectrum

There is a spectrum of response which is called the agonist spectrum in which the
channel response can be altered from full open to full closed. Call the original FM
for this gate NT. Then if NT is an excitatory substance, the docking of NT to a circle
opens the channel. In this case, we call NT an agonist and denote it by AG. On
the other hand, NT might be an inhibitor and close the channel; in that case, NT
is called an antagonist or AT. We can even have agonists and antagonists that open
or close a channel partially by r %, where r is something less that 100 %. These
work by replacing a docked agonist or antagonist and hence reducing the channel
action in either case: full open goes to partially open, full closed goes to partially
open. Hence, there is a spectrum of agonist activity ranging from full open to full
closed. Since each circle can be of a different type, the resulting behavior can be quite
complicated. Further, the C4 gates can control first messengers that initiate second
8.6 General Pharmacological Inputs 133

messenger cascades as well as immediate ion flux alterations. Thus, the production
of the second messenger and its possible protein target can also be controlled with
great precision.

8.6.4 Allosteric Modulation of Output

There is also allosteric modification where a first messenger FM1 binding to receptor
C4 (α) can have its activity modulated by another first messenger FM2 binding to a
second receptor C4 (β) to either amplify or block the FM1 effect. In this case, FM2 has
no action of its own; only if FM1 is present and docked is there a modulation. This type
of modification can also occur on the same receptor if C4 (α) has a second binding site
on one of its C4 regions for FM2 . Finally, there are cotransmitter pairings where two
first messengers can both operate independently but also together to either enhance or
diminish response. The ligand gated channels can often use drugs or pharmacological
inputs to mimic NT/receptor binding and hence generate modulated response. To
model cognition, it is thus clear that we eventually need to model pharmacological
effects of the types discussed above.

8.7 Neurotransmitter Effects

We now consider how we might model the effects of neurotransmitter modulators


as discussed in the previous section. Consider the general model shown in Fig. 8.7.
The dendrite is modeled as a Rall cable of electrotonic length L1 and the soma is
a cylinder of length L2 . Both the Rall cable and the soma can receive excitatory
and/or inhibitory current pulses generally denoted by the letters ESP and ISP. When
the output of one neuron is sent into the input system of another, we typically call
the neuron providing the input the pre-neuron and the neuron generating the out-
put, the post-neuron. The axon of the pre-neuron interacts with the dendrite of the
post-neuron via a structure called the post synaptic density or PSD. The pre-neuron
generates an axonal pulse which is the input to the PSD structure. The PSD is really
a computational object which transduces the axonal voltage signal on the pre-axon
into a ESP or ISP on the post-dendrite cable. The pre-neuron’s action potential influ-
ences the release of the contents of synaptic vesicles into the fluid contained in the
region between the neurons. Remember, the brain is a 3D organ and all neurons are
enclosed by a liquid soup of water and many other chemicals. The vesicles con-
tain neurotransmitters. For convenience, we focus on one such neurotransmitter,
labeled ζ. The vesicle containing ζ is inside a structure called a spine on the surface
of the pre-axon. The vesicle migrates to the wall of the spine and then through the
wall itself so that it is exposed to the fluid between the pre-axon and post-dendrite
(the synaptic cleft). The vesicle ruptures and spreads the λ neurotransmitter into the
synaptic cleft. The ζ neurotransmitter then acts like the trigger T0 we have already
134 8 Second Messenger Models

Fig. 8.7 The Dendrite–Soma BFV Model

discussed. It binds in some fashion to a port or gate specialized for the ζ neurotrans-
mitter. The neurotransmitter ζ then initiates a cascade of reactions:
• It passes through the gate, entering the interior of the cable. It then forms
a complex, ζ̂.
• Inside the post-dendrite, ζ̂ influences the passage of ions through the cable wall.
For example, it may increase the passage of Na+ through the membrane of the
cable thereby initiating an ESP. It could also influence the formation of a calcium
current, an increase in K+ and so forth.
• The influence via ζ̂ can be that of a second messenger trigger.
Hence, each neuron creates a brew of neurotransmitters specific to its type. A trigger
of type T0 can thus influence the production of neurotransmitters with concomitant
changes in post-neuron activity. Thus, in addition to modeling Ca++ or other triggers
as mechanisms to alter the maximum sodium and potassium conductance, we can also
model triggers that provide ways to increase or decrease neurotransmitter. Since the
neurotransmitter λ is a second messenger trigger, our full equations with multiplier
are
8.7 Neurotransmitter Effects 135
  
1 rζ [ζ(t)] − [ζ]b
hpζ ([λ(t)]) ζ
= γ (t) ζ 1 + tanh ζ
ξ0 2 gp
 
ζ
hpζ ([ζ(t)])[ζ̂]n −(x − x0 )2
I (t, x)) = √ exp
t − t0 4Dζ (t − t0 )
  ζ 
ζ ζ sζ I ζ (t, x) − r [2ζ̂]n
he (I (t, x)) = 1 + tanh ζ
2 ge
[Pζ ([ζ̂](t)] = heζ (hpζ ([ζ(t)]) [ζ̂]n )[ζ̂]n
  
ζ ζ eζ [Pζ (ζ̂(t))]
h ([P ([ζ̂](t)]) = δζ gζ max
1 + tanh
2 gζ

where x is the spatial variable for the post-dendritic cable. The final step is to interpret
what function the protein Pζ has in the cellular system. If it is a sodium or potassium
gate, the modification of the conductance of those ions is as before:

ga (t, V ) = (gamax + hζ ([Pζ ([ζ̂](t)]))Mpa (t, V )Haq (t, V )

However, the protein could increase calcium current by adding more calcium gates.
We can model the calcium current alterations as

Ca++ (t, V ) = (Ca++ )max


ζ + hζ ([Pζ ([λ̂](t)])

This gives the sodium and potassium influence equations

 hζ ([Pζ ([ζ̂](t))])  
−(x − xζs )2
δgζmax (t, x) = exp
t − tζs 4Dζs (t − tζs )
ζs

where the index ζs denotes the sites where the ζ gates are located. If the neurotrans-
mitter modifies the calcium currents, we have a different type of influence:

δ(Ca++ )max
ζ (t, x) = hζ ([Pζ ([ζ̂])(t)])
ζs

We will be focusing on only a few neurotransmitters. We will use ζ0 as the designation


for the neurotransmitter serotonin; ζ1 , for dopamine and ζ2 , for norepinephrine.
In Fig. 8.7, we show a typical computational neuron assembly. The dendritic cable
is electronic distance L1 and we will use the variable w to denote distance along
the dendrite. The soma is also modeled as a cylinder and we assume it can receive
input at electronic distances up to L2 away from the axon hillock. The inputs to the
dendrite occur through ports PD and the soma entry ports are labeled by PS . We have
a variety of inputs that enter the dendrite. For each time t0 , we sum over the dendrite
distance, the effects of all the inputs. These include
136 8 Second Messenger Models

Fig. 8.8 A high level view

• a trigger T0 which enters the dendrite port PD and is a simple modulator of the volt-
age activated gates for sodium and potassium. So we directly modify the maximum
ion conductance.
• a trigger T0 which is a second messenger whose effects on sodium and ion con-
ductance utilize a sigmoid—3 transition.
• a Ca++ injection current which can alter the sodium and potassium conductance
functions via second messenger pathways.
• neurotransmitters λ0 to λ4 which can modify sodium and potassium conductance
via second messenger effects or directly. In addition, they can alter calcium currents
via second messenger pathways.
In addition, there are inputs at time t0 at various electronic distances along the
soma. We sum these effects over the soma electronic distance in the same way. These
values are then combined to give the voltage at the action hillock that could be used
in a general Hodgkin–Huxley model to generate the resulting action potential. In
summary, we can model top level computations using the cedllular graph given in
Fig. 8.8, with the details of the computational agents coming from our abstractions
of the relevant biology.

Reference

J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd,
Singapore, 2015 in press)
Chapter 9
The Abstract Neuron Model

Let’s look at neuronal inputs and outputs more abstractly. We have learned quite a bit
about first and second messenger systems and how action potentials are generated
for input families. Now is the time to put all the to good use!

9.1 Neuron Inputs

It is clear neuron classes can have different trigger characteristics. First, consider
the case of neurons which create the monoamine neurotransmitters. Neurons of this
type in the Reticular Formation of the midbrain produce a monoamine neurotrans-
mitter packet at the synaptic junction between the axon of the pre-neuron and the
dendrite of the post-neuron. The monoamine neurotransmitter is then released into
the synaptic cleft and it induces a second messenger response as shown in Fig. 6.8.
The strength of this response is dependent on the input from pre-neurons that form
synaptic connections with the post-neuron. In addition, the size of this input deter-
mines the strength of the monoamine trigger into the post-neuron dendrite. Let the
ζ
strength be given by the weighting term cpre,post where as usual we use the values
ζ = 0 to denote the neurotransmitter serotonin, ζ = 1, for dopamine and ζ = 2, for
norepinephrine. In our model, recall that time is discretized into integer values as
time ticks 1, 2 and so forth via our global simulation clock. Also, spatial values are
discretized into multiples of various electrotonic scaling distances. With that said,
the trigger at time t and dendrite location w is therefore

ζ
 
cpre,post −(w − w0 )2
T0 (t, w) = √ exp ζ
.
t − t0 4D0 (t − t0 )

ζ
where D0 is the diffusion constant associated with the trigger. Hence, w = jL̂E for
some scaling L̂E . The trigger T0 has associated with it the protein T1 . We let

© Springer Science+Business Media Singapore 2016 137


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_9
138 9 The Abstract Neuron Model

ζ
 
dpre,post −(w − w0 )2
T1 (t, w) = √ exp ζ
.
t − t0 4D1 (t − t0 )

ζ ζ
where dpre,post denotes the strength of the induced T1 response and D1 is the diffusion
constant of the T1 protein. This trigger will act through the usual pathway. Also, we
let T2 denote the protein P(T1 ). T2 transcribes a protein target from the genome with
efficiency e.
   
r [T0 (t, w)] − [T0 ]b
hp ([T0 (t, w)]) = 1 + tanh
2 gp
 
hp ([T0 (t, w)])[T1 ]n −(w − w0 )2
I(t, w)) = √ exp
t − t0 4D1λ (t − t0 )
   
s I(t, w) − r[T21 ]n
he (I(t, w)) = 1 + tanh
2 ge
[P(T1 )](t, w) = he (I(t, w))
   
e [T2 ](t, w)
hT2 (t, w) = 1 + tanh
2 gT2
[T2 ](t, w) = hT2 (t, w)[T2 ]n

Note [T2 ](t, w) gives the value of the protein T2 concentration at some discrete time
t and spatial location jL̂E . This response can also be modulated by feedback. In this
f
case, let ξ denote the feedback level. Then, the final response is altered to hT2 where
the superscript f denotes the feedback response and the constant ω is the strength of
the feedback.

f 1
hT2 (t, w) = ω hT2 (t, w)
ξ
f
[T2 ](t, w) = hT2 (t, w)[T2 ]n

There are a large number of shaping parameters here. For example, for each neu-
rotransmitter, we could alter the parameters due to calcium trigger diffusion as dis-
ζ
cussed in Sect. 7.2. These would include D0 , the diffusion constant for the trigger,
ζ
and D1 , the diffusion constant for the gate induced protein T1 . In addition, transcribed
proteins could alter—we know their first order quantitative effects due to our earlier
ζ
analysis—dpre,post , the strength of the T1 response, r, the fraction of T1 free, gp , the
trigger gain, [T0 ]b , the trigger threshold concentration, s, the fraction of active T1
reaching genome, ge , the trigger gain for active T1 transition, [T1 ]n , the threshold for
T1 , [T2 ]n , the threshold for P(T1 ) = T2 , gT2 , the gain for T2 , ω, the feedback strength,
ζ λ
and ξ, the feedback amount for T1 = 1 − ξ. Note dpre,post could be simply cpre,post .
The neurotransmitter triggers can alter many parameters important to the creation
of the action potential. The maximum sodium and potassium conductances can be
9.1 Neuron Inputs 139

altered via the equation for T2 . For sodium we have

T2 (t, w) = hT2 (t, w)[T2 ]n

becomes

[T2 ]n = δNa gNa


max

f 1
hT2 (t, w) = ω hT2 (t, w)
ξ
f
[T2 ](t, w) = hT2 (t, w)[T2 ]n
p q
gNa (t, w, V ) = ( gNa
max
+ [T2 ](t, w) )MNa (t, V )HNa (t, V )

For potassium, the change is

[T2 ]n = δK gKmax
f 1
hT2 (t, w) = ω hT2 (t, w)
ξ
f
[T2 ](t, w) = hT2 (t, w)[T2 ]n
p q
gK (t, w, V ) = ( gNa
max
+ [T2 ](t, w) )MK (t, V )HK (t, V )

Finally, neurotransmitters and other second messenger triggers have delayed effects
in general. So if the trigger T0 binds with a port P at time t0 , the changes in protein
levels P(T1 ) need to delayed by a factor τ ζ . The soma calculations are handled
exactly like this except we are working on the soma cable and so electronic distance
is measured by the variable z instead of w.

9.2 Neuron Outputs

We now understand in principle how to compute the axon hillock voltage that will be
passed to the Hodgkin–Huxley engine to calculate the resulting action potential. The
prime purpose of the incoming voltage is to provide the proper depolarization of the
excitable cell membrane. Hence, we know that we will generate an action potential
is the incoming signal exceeds the neuron’s threshold. For our purposes, the shape
of the action potential will be primarily determined by the alteration of the gNa
max
and
gK conductance parameters. In our model, these values are altered by the second
max

messenger triggers which create or destroy the potassium and sodium gates in the
membrane. Other triggers alter the essential hardware of the neuron and potentially
the entire neuron class N in other ways. The protein T2 due to a trigger u can
140 9 The Abstract Neuron Model

• directly alter the synaptic coupling weight cpre,


u
post according to the strength of
T2 (t, w) by making changes in the extracellular side of the membrane in a variety
of ways,
• can directly impact the maximum conductances for the potassium and sodium
ions,
• can alter second messenger channels in many ways as we have discussed in the
chapters on calcium and generic second messenger triggers. These alterations can
affect the coupling weight or maximum ion conductances as well.
However, they can also effect more global parameters. Consider if the protein alters
the ration ρ. This is an fundamental coupling parameter for the dendrite and soma
system we use in our modeling. Hence, an alteration in ρ is a global change to
the entire neuron class N . This mechanism is very useful as it allows us to use
monoamine modulation to alter every neuron in a particular class no matter what
neural module it is a part of. This is a useful tool in implementing true RF type
core modulation of cortical output. Such a global change could be as simple as an
alteration to LDE or LSE without changing ρ. However, if rho is changed to rho ± ,
this changes the eigenvalue problem that the neuron class is associated with to

tanh(L)
tan(αL) = − (αL),
(ρ ± )L

Rather than performing such a numerical computation every time a global monoamine
modulation request is sent, it is easier to pre-compute the eigenvalue problems solu-
tion for a spectrum of ρ values. We then keep them in storage and access them as
needed to update the neuron class when required. The determination of the action
potential for a given axon hillock input is handled as follows. Once we know the
incoming voltage is past threshold, we generate an action potential whose parame-
ters are shaped by both the size of the pulse and the changes in ion conductances.
If we want to replace the action potential generation as the solution of a compli-
cated partial differential equation with an approximation of some kind, we need to
look at how the Hodgkin–Huxley equations are solved in more detail even though
we already introduced these equations in Peterson (2015). Our replacement for the
action potential will be called the Biological Feature Vector or BFV.
The individual neural objects in our cognitive model (and in particular, in our cor-
tical columns) will be abstractions of neural ensembles, their behaviors and outputs
gleaned from both low level biological processing and high level psychopharmacol-
ogy data. The low level information must include enough detail of how inputs are
processed (spike train generation) to be useful and enough detail of second mes-
senger pathways to see clearly that the interactions between a pre and a post neural
ensemble are really communications between their respective genomes. Clearly, this
implies that an appropriate abstract second messenger and genome model is needed.
In this section, we modify the general ball and stick model to use a very simple low
dimensional representation of the action potential.
9.3 Abstract Neuron Design 141

9.3 Abstract Neuron Design

We can see the general structure of a typical action potential is illustrated in Fig. 9.1.
This wave form is idealized and we are interested in how much information can be
transferred from one abstract neuron to another using a low dimensional biologically
based feature vector Biological Feature Vector of BFV. We can achieve such an
abstraction by noting that in a typical excitable neuron response, Fig. 9.1, the action
potential exhibits a combination of cap-like shapes. We can use the following points
on this generic action potential to construct a low dimensional feature vector of
Eq. 9.1.
⎧ ⎫

⎪ (t0 , V0 ) start point ⎪


⎪ ⎪


⎪ (t 1 , V 1 ) maximum point ⎪

⎨ ⎬
(t2 , V2 ) return to reference voltage
ζ= (9.1)

⎪ (t3 , V3 ) minimum point ⎪


⎪ ⎪


⎪ (g, t 4 , V 4 ) sigmoid model of tail ⎪

⎩ ⎭
V3 + (V4 − V3 ) tanh(g(t − t3 ))

where the model of the tail of the action potential is of the form Vm (t)=V3 +
(V4 − V3 ) tanh(g(t − t3 )). Note that Vm (t3 ) = (V4 − V3 ) g and so if we were using
real voltage data, we would approximate Vm (t3 ) by a standard finite difference. This
wave form is idealized and actual measured action potentials will be altered by noise
and the extraneous transients endemic to the measurement process in the laboratory.
To study the efficacy of this BFV for capturing useful information, we performed
a series of computational experiments on the recognition of toxins introduced into
the input side of a cell from their effects on the cells action potential. We know

Fig. 9.1 Prototypical action


potential
142 9 The Abstract Neuron Model

biotoxins alter the shape of this action potential in many ways. The toxin guide of
Adams and Swanson (1996) shows quite clearly the variety of ways that a toxin can
alter ion gate function in the cell membrane, second messenger cascades and so forth.
Older material from Kaul and Daftari (1986), Wu and Narahashi (1988) and Schiavo
et al. (2000) focus on the details of specific classes of toxins. Kaul and Wu focus on
pharmacological active substances from the sea. Schiavo investigates the effects of
toxins that interfere with the release of neurotransmitters by altering the exocytosis
process. The effects of toxins on the presynaptic side are analyzed in Harvey (1990),
Strichartz et al. (1987) presents a summary of how toxins act on sodium channels.
Let’s focus on a single action potential and examine how its characteristics change
when we introduce various alterations in the parameters that effect its shape. We know
that toxins cause many such effects and we can classify toxins into families based on
how they alter the parameters that can effect the action potential. A simulation is then
possible that collects information from a family of generated action potentials due
to the introduction of various toxin families. The BFV abstracts the characteristics
into a low dimensional vector and in Peterson and Khan (2006), we showed the
BFV was capable of determining which toxin family was used to alter the action
potential and our ability to discern the toxin family used compared nicely to other
methods of feature vector extraction such as the classic covariance technique, a total
variation method and an associated spline method that uses knots determined by the
total variation approach.
In order to demonstrate the efficacy of the BFV approach for capturing infor-
mation, we generated families of action potentials from a classic Hodgkin–Huxley
model. We also assume a toxin in a given family alters the action potential in a
specific way which we will call the toxin family signature. We studied two type
of signatures: first, families that alter the maximum sodium and potassium conduc-
tance parameters by a given percentage and second, families that perturb the standard
Hodgkin–Huxley α–β values.
Now, an input into an artificial neuron generates an abstract BFV response. Larger
ensembles of artificial neurons that can then be created to serve as modules or neural
objects in graphs.
The particular values on the components of a single artificial neuron’s or a
module’s feature vector, BFV, are then amenable to alteration via neurotransmitter
action such as serotonin, norepinephrine and dopamine coming from the midbrain
module of Fig. 5.3. The feature vector output of a neural object is thus due to the
cumulative effect of second messenger signaling to the genome of this object which
influences the action potential and thus feature vector of the object by altering its
complicated mixture of ligand and voltage activated ion gates, enzymes and so forth.
For example, the G protein-linked receptor superfamily second messenger system
would consist of a receptor with 7 transmembrane regions with links to G proteins that
uses a second messenger system activated by an enzyme—cAMP and PI and because
a second messenger system is used, response to a signal is delayed and hence, these
are slow response systems. Another family uses receptors with 4 transmembrane
regions. In this family, the ion channel is surrounded by multiple copies of multiple
different receptors and ion flow is directly controlled by a given particular mixture of
neurotransmitters and receptors. Clearly, there are many control possibilities that
9.3 Abstract Neuron Design 143

arise here due to the combinatorial nature of this family’s channels. It is difficult to
find appropriate high level descriptions of such events so that the algorithmic struc-
tures are evident and not obscured by the detail. Two important resources that have
been prime influences and have helped us develop generic models of neural objects
which have feature vector outputs which can be shaped by such pharmacological
inputs have been Stahl (2000, psychopharmacology) and Gerhart and Kirschner
(1997, evolution of cellular physiology). They have been out for awhile now, but
they are still packed with useful information for approximation schemes. The action
potential is generated by an input on the dendritic side of an excitable nerve cell.
We wish to analyze this signal and from its properties, discern whether or not a
toxin/ligand has been introduced into the input stream. further, we want to be able to
recognize the toxin as belonging to a certain family. Effectively, this means we can
label the output as due to a certain ligand input. For our purposes, we are interested in
a single action potential with general structure as illustrated in Fig. 9.1. We concen-
trate for the moment on single output voltage pulses produced by toxins introduced
into the input side of the nerve cell. The wave form in Fig. 9.1 is, of course, idealized
and the actual measured action potentials will be altered by noise and the extraneous
transients endemic to the measurement process in the laboratory.
In order to show that the abstract feature vector, BFV, is capable of capturing some
of the information carried by the action potential, we studied how to use this kind of
feature vector to determine whether or not an action potentials has been influenced
by a toxin introduced into the dendritic system of an excitable neuron. The role of the
toxin is of course similar to the role of the monoamine neurotransmitters we wish to
use to modulate cortical output. Recall the full information processing here is quite
complex as seen in Fig. 9.2 and we are approximating portions of it. We explain
the toxin studies in some detail in the sections that follow. This provides additional
background on the reasons for our choice of low dimensional feature vector. The
basic Hodgkin–Huxley model depends on a large number of parameters and we will
be using perturbations of these as a way to model families of toxins. Of course,
more sophisticated action potential models can be used, but the standard two ion
gate Hodgkin–Huxley model is sufficient for our needs in this paper. In Sect. 9.3.1,
we present the toxin recognition methodology for the first toxin family that modify
the maximal sodium and potassium conductances. We believe that toxins of this sort
include some second messenger effects. Our general classification methodology is
then discussed and applied to the this collection of toxin families. We show that
we can design a reasonable recognizer engine using a low dimensional biologically
based feature vector. In Sect. 9.3.1.2, we introduce the second class of toxin families
whose effect on the action potential is more subtle. We show that the biological
feature vector performs well in this case also.

9.3.1 Toxin Recognition

Recall, when two cells interact via a synaptic interface, the electrical signal in the
pre-synaptic cell in some circumstances triggers a release of a neurotransmitter (NT)
144 9 The Abstract Neuron Model

Fig. 9.2 A simplified path of information processing in the brain. Arrows indicate information
processing pathways
9.3 Abstract Neuron Design 145

from the pre-synapse which crosses the synaptic cleft and then by docking to a port
on the post cell, initiates a post-synaptic cellular response. The general pre-synaptic
mechanism consists of several key elements: one, NT synthesis machinery so the NT
can be made locally; two, receptors for NT uptake and regulation; three, enzymes
that package the NT into vesicles in the pre-synapse membrane for delivery to the
cleft. On the post-synaptic side, we will focus on two general responses. The fast
response is triggered when the bound NT/Receptor complex initiates an immediate
change in ion flux through the gate thereby altering the electrical response of the
post cell membrane and hence, ultimately its action potential and spike train pattern.
The slow response occurs when the initiating NT triggers a second response in the
interior of the cell. In this case, the first NT is called the first messenger and the
intracellular response (quite complex in general) is the second messenger system.
Further expository details of first and second messenger systems can be found in
Stahl (2000). From the above discussion, we can infer that a toxin introduced to the
input system of an excitable cell influences the action potential produced by such a
cell in a variety of ways. For a classic Hodgkin Huxley model, there are a number of
critical parameters which influence the action potential.
The nominal values of the maximum sodium and potassium conductance G0Na
and G0K can be altered. We view these values as a measure of the density of a classic
Hodgkin Huxley voltage activated gates per square cm of biological membrane.
Hence, changes in this parameter require the production of new gates or the creation
of enzymes that control the creation/destruction balance of the gates. This parameter
is thus related to second messenger activity as access to the genome is required to
implement this change. We can also perturb the parameters that shape the α and β
functions which we introduced in Peterson (2015). These functions are all special
cases of the of the general mapping F(Vm , p, q) with p ∈ 4 and q ∈ 2 defined by

p0 (Vm + q0 ) + p1
F(Vm , p, q) = .
ep2 (Vm +q1 ) + p3

For ease of exposition, here we will denote MNa by m, HNa by h and MK by n. We


have used h for sigmoid type transitions elsewhere, but historically, m, h and n have
been used for these Hodgkin–Huxley models. The α and β pairs are thus described
using the generic F mapping by

αm = F(Vm , pαm = {−0.10, 0.0, −0.1, −1.0}, qmα = {35.0, 35.0}),


βm = F(Vm , pβm = {0.0, 4.0, 0.0556, 0.0}, qmβ = {60.0, 60.0}),
αh = F(Vm , pαh = {0.0, 0.07, 0.05, 0.0}, qhα = {60.0, 60.0}),
β β
βh = F(Vm , ph = {0.0, 1.0, −0.1, 1.0}, qh = {30.0, 30.0}),
αn = F(Vm , pαn = {−0.01, 0.0, −0.1, −1.0}, qnα = {50.0, 50.0}),
βn = F(Vm , pβn = {0.0, 0.125, 0.0125, 0.0}, qnβ = {60.0, 60.0}).
146 9 The Abstract Neuron Model

The p and q parameters control the shape of the action potential in a complex way.
From our discussions about the structure of ion gates, we could think of alterations in
the (p, q) pair associated with a given α and/or β as a way of modeling how passage
of ions through the gate are altered by the addition of various ligands. These effects
may or may not be immediate. For example, the alterations to the p and q parameters
may be due to the docking of ligands which are manufactured through calls to the
genome in the cell’s nucleus. In that case, there is a long delay between the initiation
of the second messenger signal to the genome and the migration of the ligands to
the outside of the cell membrane. In addition, proteins can be made which bind to
the inside of a gate and thereby alter the ion flow. We understand that this type of
modeling is not attempting to explain the details of such interactions. Instead, we are
exploring an approach for rapid identification and differentiation of the signals due
to various toxins.
Now assume that the standard Hodgkin Huxley model for gNa 0
= 120, gK0 = 36.0
and the classical α, β functions are labeled as the nominal values. Hence, there is a
nominal vector 0 given by
⎡ ⎤
G0Na = 120.0 G0K = 36.0
⎢ α 0 ⎥
⎢ (pm ) {−0.10, 0.0, −0.1, −1.0} (pβm )0 {0.0, 4.0, 0.0556, 0.0} ⎥
⎢ ⎥
⎢ α 0 ⎥
⎢ (p ) {0.0, 0.07, 0.05, 0.0}
β
(ph )0 {0.0, 1.0, −0.1, 1.0} ⎥
⎢ h ⎥
⎢ α 0 ⎥
0 = ⎢
⎢ (pn ) {−0.01, 0.0, −0.1, −1.0} (pβn )0 {0.0, 0.125, 0.0125, 0.0} ⎥

⎢ α 0 ⎥
⎢ (q ) {35.0, 35.0} (qmβ )0 {60.0, 60.0} ⎥
⎢ m ⎥
⎢ ⎥
⎢ (qα )0 {60.0, 60.0}
β
(qh )0 {30.0, 30.0} ⎥
⎣ h ⎦
(qnα )0 {50.0, 50.0} (qnβ )0 {60.0, 60.0}

A toxin G thus has an associated toxin signature, E(G) which consists of deviations
from the nominal classical Hodgkin Huxley parameter suite: E(G) = 0 + δ, where
δ is a vector, or percentage changes from nominal, that we assume the introduction
of the toxin initiates. As you can see, if we model the toxin signature in this way, we
have a rich set of possibilities we can use for parametric studies.

9.3.1.1 Simple Second Messenger Toxins

We begin with toxins whose signatures are quite simple as they only cause a change
in the nominal sodium and potassium maximum conductance. These are therefore
second messenger toxins. This simulation will generate five toxins whose signatures
are distinct. Using C++ (not MatLab!) as our code base, we designed a TOXIN class
whose constructor generates a family of distinct signatures using a signature as a
base. Here, we will generate 20 sample toxin signatures clustered around the given
signature using a neighborhood size of 0.02. First, we generate five toxins using the
toxin signatures of Table 9.1.
9.3 Abstract Neuron Design 147

Table 9.1 Toxin Toxin Signature [δgNa


0 , δg 0 ]
K
conductance signature
A [0.45, −0.25]
B [0.05, −0.35]
C [0.10, 0.45]
D [0.55, 0.70]
E [0.75, −0.75]

Fig. 9.3 Applied synaptic


pulse

These percentage changes are applied to the base conductance values to generate
the sodium, potassium and leakage conductances for each simulation. We use these
toxins to generate 100 simulation runs using the 20 members of each toxin family.
We believe that the data from this parametric study can be used to divide up action
potentials into disjoint classes each of which is generated by a given toxin. Each of
these simulation runs use the synaptic current injected as in Fig. 9.3 which injects
a modest amount of current over approximately 2 s. The current is injected at four
separate times to give the gradual rise seen in the picture. In Fig. 9.4, we see all
100 generated action potentials. You can clearly see that the different toxin families
create distinct action potentials.
The Biological Feature Vector Based Recognizer One can easily observe that for
a typical response, as in Fig. 9.1, the potential exhibits a combination of cap-like
shapes. We can use the following points on this generic action potential to construct
the low dimensional feature vector given earlier as Eq. 9.1 which we reproduce for
convenience.
⎡ ⎤
(t0 , V0 ) start point
⎢ (t1 , V1 ) maximum point ⎥
⎢ ⎥
ξ = ⎢ (t2 , V2 ) return to reference voltage ⎥

⎥,
⎣ (t3 , V3 ) minimum point ⎦
(g, t4 , V4 ) sigmoid model of tail
148 9 The Abstract Neuron Model

Fig. 9.4 The toxin families

with the model of the tail of the action potential is of the form

Vm (t) = V3 + (V4 − V3 ) tanh (g(t − t3 )) ,

Note that

Vm (t3 ) = (V4 − V3 ) g.

We approximate Vm (t3 ) by a standard finite difference. We pick a data point (t5 , V5 )
that occurs after the minimum—typically we use the voltage value at the time t5 that
is 5 time steps downstream from the minimum and approximate the derivative at
t3 by

V5 − V3
Vm (t3 ) ≈
t5 − t3

The value of g is then determined to be

V5 − V3
g=
(V4 − V3 )(t5 − t3 )
9.3 Abstract Neuron Design 149

which reflects the asymptotic nature of the hyperpolarization phase of the potential.
Note that ξ is in 11 . We see the rudimentary feature vector extraction we have called
the BFV is quite capable of generating a functional recognizer; i.e. distinguishing
between toxin inputs. Hence, we are confident we can use the BFV is develop-
ing approximations to nodal computations . In all of the recognizers constructed in
Peterson and Khan (2006), the classification of the toxin is determined by finding
the toxin class which is minimum distance from the sample.

9.3.1.2 Toxins that Reshape α–β Parameters

Our earlier discussions focused on toxins that cause a change in the nominal sodium
and potassium maximum conductance, and so are effectively second messenger tox-
ins. However, the 36 additional α–β parameters that we have listed all can be altered
by toxins to profoundly affect the shape of the action potential curve. In this section,
we will focus on parameter changes that effect a small subset of the full range of
α–β possibilities.
First, we generate five toxins as shown below. Note that Toxin A perturbs the q
parameters of the α–β for the sodium activation m only; Toxin B does the same for
the q parameters of the sodium h inactivation; Toxin C alters the q parameters of
the potassium activation n; Toxin D changes the p[2] value of the α–β functions for
the sodium inactivation h; and Toxin E, does the same for the potassium activation
n. This is just a sample of what could be studied. These particular toxin signatures
were chosen because the differences in the generated action potentials for various
toxins from family A, B, C, D or E will be subtle. For example, in Toxin A, a
perturbation of the given type generates the α–β curves for mNA as shown in Fig. 9.5.
Note all we are doing is changing the voltage values of 35.0 slightly. This small
change introduces a significant ripple in the α curve. Note, we are only perturbing

Fig. 9.5 m perturbed


alpha–beta curves
150 9 The Abstract Neuron Model

Table 9.2 Toxin α–β Toxin A α = {0.2, −0.1}


δqm
signatures
β
δqm = {−0.1, 0.1}
Toxin B δqhα = {−0.1, 0.1}
β
δqh = {0.1, −0.2}
Toxin C δqnα = {−0.2, 0.1}
β
δqn = {−0.1, 0.1}
Toxin D δpαh {0.0, 0.0, 3.0, 0.0}
β
δph {0.0, 0.3, 0.0, 0.0}
Toxin E δpαn {0.0, 0.3, 0.0, 0.0}
β
δpn {0.0, 0.3, 0.0, 0.0}

two parameters at a time in a given toxin family. In all of these toxins, we will be
leaving the maximum ion conductances the same. We use these toxins to generate
100 simulation runs using the 20 members of each toxin family. We believe that
the data from this parametric study can be used to divide up action potentials into
disjoint classes each of which is generated by a given toxin. Each of these simulation
runs use the synaptic current injected as in Fig. 9.3 which injects a modest amount
of current over approximately 2 s. We generated five toxins whose signatures were
distinct. Again, using C++ as our code base, we designed a TOXIN class whose
constructor generates a family of distinct signatures using a given signature as a
base. Here, we generated twenty sample toxin signatures clustered around the given
signature using a neighborhood size for each of five toxin families. First, we generate
five toxin families. The types of perturbations each toxin family uses are listed in
Table 9.2. In this table, we only list the parameters that are altered. Note that Toxin
A perturbs the q parameters of the α–β for the sodium activation m only; Toxin B
does the same for the q parameters of the sodium h inactivation; Toxin C alters the q
parameters of the potassium activation n; Toxin D changes the p[2] value of the α–β
functions for the sodium inactivation h; and Toxin E, does the same for the potassium
activation n. This is just a sample of what could be studied. These particular toxin
signatures were chosen because the differences in the generated action potentials for
various toxins from family A, B, C, D or E will be subtle. For example, in Toxin
A, a perturbation of the given type generates the α–β curves for mNA as shown in
Fig. 9.5. Note all we are doing is changing the voltage values of 35.0 slightly. This
small change introduces a significant ripple in the α curve. We use these toxins to
generate 100 simulation runs using the 20 members of each toxin family and the
BFV approach can easily distinguish between these families.
Each of these simulation runs use the synaptic current injection protocol as
described before. The current is injected at four separate times to give the grad-
ual rise seen in the picture. In Fig. 9.6, we see all 100 generated action potentials.
You can clearly see that the different toxin families create distinct action potentials
and as shown in Peterson and Khan (2006), the BFV is very capable at as a functional
recognizer which determines the toxin family that has perturbed the dendritic inputs.
9.4 Feature Vector Abstraction 151

Fig. 9.6 Generated voltage traces for the α–β toxin families

9.4 Feature Vector Abstraction

In the work of this chapter, we have indicated a low dimensional feature vector
based on biologically relevant information extracted from the action potential of an
excitable nerve cell is capable of subserving biological information processing. We
can use such a BFV to extract information about how a given toxin influences the
shape of the output pulse of an excitable neuron. Hence, we expect modulatory inputs
into our abstract neuron can be modeled as alterations to the components of the BFV.
The biological feature vector stores many of the important features of the action
potential is a low dimensional form. We note these include
• The interval [t0 , t1 ] is the duration of the rise phase. This interval can be altered
or modulated by neurotransmitter activity on the nerve cell’s membrane as well as
second messenger signaling from within the cell.
• The height of the pulse, V1 , is an important indicator of excitation.
• The time interval between the highest activation level, V1 and the lowest, V3 , is
closely related to spiking interval. This time interval, [t1 , t3 ], is also amenable to
alteration via neurotransmitter input.
• The “height” of the depolarizing pulse, V4 , helps determine how long it takes for
the neuron to reestablish its reference voltage, V0 .
152 9 The Abstract Neuron Model

• The neuron voltage takes time to reach reference voltage after a spike. This is the
time interval by the interval [t3 , ∞].
• The exponential rate of increase in the time interval [t3 , ∞] is also very important
to the regaining of nominal neuron electrophysiological characteristics.
Clearly, we can model an inhibitory pulse in essentially the same way, mutatis
mutandi. We will assume all of the data points in our feature vector are potentially
mutable due to neurotransmitter activity, input pulses into the neuron’s dendritic
system and alteration of the neuron hardware via genome access with the second
messenger system.
Although it is possible to do detailed modeling of biological systems using
GENESIS and NEURON, we do not believe that they are useful tools in modeling
the kind of information flow between cortical modules that is needed for a cognitive
model. Progress in building large scale models that involve many cooperating neu-
rons will certainly involve making suitable abstractions in the information processing
that we see in the neuron. Neurons transduce and integrate information on the den-
dritic side into wave form pulses and there are many models involving filtering and
transforms which attempt to “see” into the action potential and find its informational
core so to speak. However, all of these methods are hugely computationally expensive
and even a simple cognitive model will require ensembles of neurons acting together
locally to create global effects. For reasons outlined above, we believe alterations in
the parameters of the simple biological feature vector (BFV) can serve as modulatory
agents in ensembles of abstract neurons. The kinds of changes one should use for
a given neurotransmitters modulatory effect can be estimated from the biophysical
and toxin literature. For example, an increase in sodium ion flow, Ca++ gated second
messenger activity can be handled at a high level as a suitable change in one of the
11 parameters of the BFV.

9.4.1 The BFV Functional Form

In Fig. 9.7, we indicated the three major portions of the biological feature vector and
the particular data points chosen from the action potential which are used for the
model. These are the two parabolas f1 and f2 and the sigmoid f3 . The parabola f1 is
treated as the two distinct pieces f11 and f12 given by

f11 (t) = a11 + b11 (t − t1 )2 (9.2)


f12 (t) = a12 + b12 (t − t1 )2 (9.3)

Thus, f1 consists of two joined parabolas which both have a vertex at t1 . The functional
form for f2 is a parabola with vertex at t3 :

f2 (t) = a2 + b2 (t − t3 )2 (9.4)
9.4 Feature Vector Abstraction 153

Fig. 9.7 The BFV functional form

Finally, the sigmoid portion of the model is given by

f3 (t) = V3 + (V4 − V3 ) tanh(g(t − t3 )) (9.5)

We have also simplified the BFV even further by dropping the explicit time point t4
and modeling the portion of the action potential after the minimum voltage by the
sigmoid f3 . From the data, it follows that

f11 (t0 ) = V0 = a2 + b11 (t0 − t1 )2


f11 (t1 ) = V1 = a11
f12 (t1 ) = V1 = a12
f11 (t2 ) = V2 = a12 + b12 (t2 − t1 )2
154 9 The Abstract Neuron Model

This implies
a11 = V1
V0 − V1
b11 =
(t0 − t1 )2
a = V1
12

V2 − V1
b12 =
(t2 − t1 )2

In a similar fashion, the f2 model is constrained by

f2 (t2 ) = V2 = a2 + b2 (t2 − t3 )2
f2 (t3 ) = V3 = a2

We conclude that
a2 = V3
V2 − V3
b2 =
(t2 − t3 )2

Hence, the functional form of the BFV model can be given by the mapping f of
Eq. 9.6.
⎧ −V1

⎪ V1 + (tV00−t1)
2 (t − t1 ) ,
2
t 0 ≤ t ≤ t1


⎪ V 2 −V
⎨ V1 + (t −t )2 (t − t1 ) ,
1 2
t 1 ≤ t ≤ t2
f (t) =
2 1
−V (9.6)

⎪ V3 + (t22−t3 )32 (t − t3 )2 ,
V
t 2 ≤ t ≤ t3



⎩ V4 + (V4 − V3 ) tanh(g(t − t − 3)), t3 ≤ t < ∞

All of our parabolic models can also be written in the form

1
p(t) = ± (t − α)

where 4β is the width of the line segment through the focus of the parabola. The
models f11 and f12 point down and so use the “minus” sign while f2 uses the “plus”.
By comparing our model equations with this generic parabolic equation, we find the
width of the parabolas of f11 , f12 and f2 is given by

(t0 − t1 )2 −1
4β11 = = 11
V1 − V0 b
(t2 − t1 )2 −1
4β12 = = 12
V1 − V2 b
(t2 − t3 ) 2
1
4β2 = = 2
V2 − V3 b
9.4 Feature Vector Abstraction 155

9.4.2 Modulation of the BFV Parameters

We want to modulate the output of our abstract neuron model by altering the BFV.
The BFV itself consists of 10 parameters, but better insight, into how alterations of
the BFV introduce changes in the “action potential” we are creating, comes from
studying changes in the mapping f given in Sect. 9.4.1. In addition to changes in
timing, t0 , t1 , t2 and t3 , we can also consider the variations of Eq. 9.7.
⎡ ⎤ ⎡ ⎤
⎡ V1
⎤   Maximum
 Voltage
a 11
⎢ V0 −V1 ⎥ ⎢ −1 ⎥
⎢ b11 ⎥ ⎢  (t0 −t1 )2 ⎥ ⎢  4β11 ⎥
⎢ 12 ⎥ ⎢ ⎢
⎥ ⎢
⎥ ⎢  Maximum Voltage


⎢ a ⎥ ⎢ V1  ⎥ ⎢   ⎥
⎢ 12 ⎥ = ⎢ ⎥=⎢ ⎥ (9.7)
⎢ b ⎥ ⎢  V 2 −V 1
⎥ ⎢  −1

⎢ 2 ⎥ ⎢ (t2 −t1 )2 ⎥ ⎢ 4β12 ⎥
⎣ a ⎦ ⎢ V ⎥ ⎢  Minimum Voltage ⎥
b2 ⎣  3 ⎦ ⎣   ⎦
−V3
 (tV22−t 3)
2  4β1 2

It is clear that modulatory inputs that alter the cap shape and hyperpolarization curve
of the BFV functional form can have a profound effect on the information contained
in the “action potential”. For example, a hypothetical neurotransmitter that alters V1
will also alter the latis rectum distance across the cap f1 . Further, direct modifications
to the latis rectum distance in any of the two caps f11 and f12 can induce corresponding
changes in times t0 , t1 and t2 and voltages V0 , V1 and V2 . A similar statement can
be made for changes in the latis rectum of cap f2 . For example, if a neurotransmitter
−V0
induced a change of, say 1 % in 4β11 , this would imply that ( (tV01−t 1)
2 ) = .04β11
0

where β11 0
denotes the original value of β11
0
. Thus, to first order
 ∗  ∗  ∗  ∗
∂β11 ∂β11 ∂β11 ∂β11
.04β11
0
= V0 + V1 + t0 + t1
∂V0 ∂V1 ∂t0 ∂t1
(9.8)

where the superscript ∗ on the partials indicates they are evaluated at the base point
(V0 , V1 , t0 , t1 ). Taking partials we find
 
∂β11 (t0 − t1 )2 2
= 2 = β0
∂V0 (V1 − V0 )2 V1 − V0 11
 
∂β11 (t0 − t1 )2 2
= −2 =− β0
∂V1 (V1 − V0 )2 V1 − V0 11
 
∂β11 t 0 − t1 2
= 2 = β0
∂t0 V1 − V0 t0 − t1 11
 
∂β11 t0 − t1 2
= −2 =− β0
∂t1 V1 − V0 t0 − t1 11
156 9 The Abstract Neuron Model

Thus, Eq. 9.8 becomes

2V0 0 2V1 0 1 1
.04β11
0
= β − β + 2t0 β 0 − 2t1 β0
V1 − V0 11 V1 − V0 11 t0 − t1 11 t0 − t1 11

This simplifies to

.02(V1 − V0 )(t0 − t1 ) = (V0 − V1 )(t0 − t1 ) − (t0 − t1 )(V1 − V0 )

Since we can do this analysis for any percentage r of β11 0


, we can infer that a neu-
rotransmitter that modulates the action potential by perturbing the “width” or latis
rectum of the cap of f11 can do so satisfying the equation

2r(V1 − V0 )(t1 − t0 ) = (V0 − V1 )(t0 − t1 ) − (t0 − t1 )(V1 − V0 )

Similar equations can be derived for the other two width parameters for caps f12 and
f3 . These sorts of equations give us design principles for complex neurotransmitter
modulations of a BFV.

9.4.3 Modulation via the BFV Ball and Stick Model

The BFV model we build consists of a dendritic system and a computational core
which processed BFV input sequence to generate a BFV output.

9.4.3.1 The Modulation of the Action Potential

We know that
dVM
Cm = IE − gKmax (MK )4 (Vm , t)(Vm − EK )
dt
− gNa
max
(MNA )3 (Vm , t)(HNA )(Vm , t)(Vm − ENa ) − gL (Vm − EL ).

Since the BFV is structured so that the action potential has a maximum at t1 of value
V1 and a minimum at t3 of value V3 , we have Vm (t1 ) = 0 and Vm (t3 ) = 0. This gives

IE (t1 ) = gKmax (MK )4 (V1 , t1 )(V1 − EK )


+ gNa
max
(MNA )3 (V1 , t1 )(HNA )(V1 , t1 )(V1 − ENa ) + gL (V1 − EL )
IE (t3 ) = gKmax (MK )4 (V3 , t3 )(V3 − EK )
+ gNa
max
(MNA )3 (V3 , t3 )(HNA )(V3 , t3 )(V3 − ENa ) + gL (V3 − EL )
9.4 Feature Vector Abstraction 157

Fig. 9.8 The product MNA 3 HNA : sodium conductances during a pulse 1100 at 3.0

Fig. 9.9 The product MK 4 : potassium conductances during a pulse 1100 at 3.0

From Figs. 9.8 and 9.9, we see that m3 (V1 , t1 )h(V1 , t1 ) ≈ 0.35 and n4 (V1 , t1 ) ≈ 0.2.
Further, m3 (V3 , t3 )h(V3 , t3 ) ≈ 0.01 and n4 (V3 , t3 ) ≈ 0.4.
Thus,

IE (t1 ) = 0.20gKmax (V1 − EK ) + 0.35gNa


max
(V1 − ENa ) + gL (V1 − EL )
IE (t3 ) = 0.40gKmax (V3 − EK ) + 0.01gNa
max
(V3 − ENa ) + gL (V3 − EL )
158 9 The Abstract Neuron Model

Reorganizing,
 max + 0.35g max + g  V − 0.20g max E + 0.35g max E 
IE (t1 ) = 0.20gK Na L 1 K K Na Na + gL EL
 max + 0.01g max + g  V − 0.40g max E + 0.01g max E 
IE (t3 ) = 0.40gK Na L 3 K K Na Na + gL EL

Solving for the voltages, we find

IE (t1 ) + 0.20gKmax EK + 0.35gNa


max
ENa + gL EL
V1 =
0.20gK + 0.35gNa + gL
max max

IE (t3 ) + 0.40gKmax EK + 0.01gNa


max
ENa + gL EL
V3 =
0.40gK + 0.01gNa + gL
max max

Thus,

∂V1 1
= 0.20EK
∂gKmax 0.20gKmax + 0.35gNa
max
+ gL
IE (t1 ) + 0.20gK EK + 0.35gNa
max max
ENa + gL EL
+
0.20gK + 0.35gNa + gL
max max

−1.0
.20
0.20gKmax + 0.35gNa max
+ gL

This simplifies to

∂V1 0.20
max = (EK − V1 ) (9.9)
∂gK 0.20gK + 0.35gNa
max max
+ gL

Similarly, we find

∂V1 0.35
max = (ENa − V1 ) (9.10)
∂gNa 0.20gKmax + 0.35gNa
max
+ gL
∂V3 0.40
= (EK − V3 ) (9.11)
∂gKmax 0.40gKmax + 0.01gNa
max
+ gL
∂V3 0.40
max = (ENa − V3 ) (9.12)
∂gNa 0.40gKmax + 0.01gNa
max
+ gL

We also know that as t goes to infinity, the action potential flattens and Vm approaches
0. Also, the applied current, IE is zero and so we must have

0 = −gKmax (MK )4 (V∞ , ∞)(V∞ − EK )


− gNa
max
(MNA )3 (V∞ , ∞)(HNA )(V∞ , ∞)(V∞ − ENa ) − gL (V∞ − EL )
9.4 Feature Vector Abstraction 159

Our hyperpolarization model is


 
Y (t) = V3 + (V4 − V3 ) tanh g(t − t3 )

We have V∞ is V4 . Thus,

0 = −gKmax (MK )4 (V4 , ∞)(V4 − EK )


− gNa
max
(MNA )3 (V4 , ∞)(HNA )(V4 , ∞)(V4 − ENa ) − gL (V4 − EL )

This gives, letting (MNA )3 (V4 , ∞)(HNA )(V4 , ∞) and (MK )4 (V4 , ∞) be denoted by
((MNA )3 (HNA ))∗ and ((MK )4 )∗ for simplicity of exposition,
 
gKmax ((MK )4 )∗ + gNa
max
((MNA )3 (HNA ))∗ + gL V4
 
= gKmax ((MK )4 )∗ EK + gNamax
((MNA )3 (HNA ))∗ ENa + gL EL

Hence, letting ((MNA )3 (HNA ))∗ ≡ (m3 h)∗ and ((MK )4 )∗ ≡ (n4 )∗ , we have

gKmax n4 (V4 , ∞)EK + gNa m (V4 , ∞)h(V4 , ∞)ENa + gL EL


max 3
V4 =
gK n (V4 , ∞) + gNa
max 4
m (V4 , ∞)h(V4 , ∞) + gL
max 3

We see
 
∂V4 n4 (V4 , ∞)
= EK − V4 (9.13)
∂Gmax
K
max n4 (V , ∞) + g max m3 (V , ∞)h(V , ∞) + g
gK 4 Na 4 4 L
 
∂V4 m3 (V4 , ∞)h(V4 , ∞)
max = max 4 ENa − V4 (9.14)
∂GNa max m3 (V , ∞)h(V , ∞) + g
gK n (V4 , ∞) + gNa 4 4 L

We can also assume that the area under the action potential curve from the point
(t0 , V0 ) to (t1 , V1 ) is proportional to the incoming current applied. If VIn is the
axon-hillock voltage, the impulse current applied to the axon-hillock is gIn VIn where
gIn is the ball stick model conductance for the soma. Thus, the approximate area
under the action potential curve must match this applied current. We have

1
(t1 − t0 ) (V1 − V0 ) ≈ gIn VIn
2
We conclude
2gIn VIn
(t1 − t0 ) =
V1 − V0
160 9 The Abstract Neuron Model

Thus
∂(t1 − t0 ) t 1 − t0 ∂V1
=− (9.15)
∂gKmax
V1 − V0 ∂gKmax
∂(t1 − t0 ) t 1 − t0 ∂V1
=− (9.16)
∂gNa
max
V1 − V0 ∂gNa
max

Also, we know that during the hyperpolarization phase, the sodium current is off and
the potassium current is slowly bringing the membrane potential back to the reference
voltage. Now, our BFV model does not assume that the membrane potential returns
to the reference level. Instead, by using
 
Y (t) = V3 + (V4 − V3 ) tanh g(t − t3 )

we assume the return is to voltage level V4 . At the midpoint, Y = 21 (V3 + V4 ), we


find
 
1
(V4 − V3 ) = (V4 − V3 ) tanh g(t − t3 )
2

Thus, letting u = g(t − t3 ),

1 e2u − 1
= 2u
2 e +1

and we find u = ln(3)


2
. Solving for t, we then have

ln(3)
t ∗ = t3 +
2g

From t3 on, the Hodgkin–Huxley dynamics are

dVM
Cm = −gKmax (MK )4 (Vm , t)(Vm − EK ) − gL (Vm − EL ).
dt

We want the values of the derivatives to match at t ∗ . This gives


 
g max
g(V ∗ − V3 ) sech2 g(t ∗ − t3 ) = − K (MK )4 (V ∗ , t ∗ )(V ∗ − EK ) − gL (V ∗ − EL )
Cm

where V ∗ = 21 (V3 + V4 ). Now g(t ∗ − t3 ) = ln(3)


2
and thus we find
9.4 Feature Vector Abstraction 161
 
g g max gL
(V4 − V3 ) sech2 g(t ∗ − t3 ) = − K (MK )4 (V ∗ , t)(V ∗ − EK ) − (V ∗ − EL )
2 Cm Cm
g 9 g max
(V4 − V3 ) = − K (MK )4 (V ∗ , t ∗ )(V ∗ − EK ) − gL (V ∗ − EL )
2 64 Cm

Next, consider the magnitude of (MK )4 (V ∗ , t ∗ ). We know at t ∗ , (MK )4 is small from


Fig. 9.9. Thus, we will replace it by the value 0.01. This gives
   
g 9 g max 1 gL 1
(V4 − V3 ) = −0.01 K (V4 + V3 ) − EK − (V4 + V3 ) − EL
2 64 Cm 2 Cm 2

Simplifying, we have
   
9g g max gL 1 g max gL
(V4 − V3 ) = 0.01 K EK + EL − 0.01 K + (V4 + V3 )
128 Cm Cm 2 Cm Cm
   
9g g max gL 1 g max gL V4 + V3
= 0.01 K EK + EL − 0.01 K +
64 Cm Cm V4 − V3 Cm Cm V4 − V3

We can see clearly from the above equation, that the dependence of g on gKmax and
gNa
max
is quite complicated. However, we can estimate this dependence as follows. We
know that V3 + V4 is about the reference voltage, −65.9 mV. If we approximate V3
by the potassium battery voltage, Ek = −72.7 mV and V4 by the reference voltage,
we find VV34 +V4
−V3
≈ −138.6
6.8
= −20.38 and V4 −V
1
3
≈ 6.8
1
= 0.147. Hence,
   
9Cm g
= 0.147 0.01gKmax EK + gL EL + 20.38 0.01gKmax + gL
64
   
= 0.0147EK + 2.038EL gKmax + gL 0.0147EL + 20.38

Thus, we find
 
∂g 64
= 0.0147EK + 2.038EL (9.17)
∂gKmax 9Cm

This gives ∂g∂gmax ≈ −710.1 Eq. 9.17 shows what our intuition tells us: if gKmax
K
increases, the potassium current is stronger and the hyperpolarization phase is short-
ened. On the other hand, if gKmax decreases, the potassium current is weaker and the
hyperpolarization phase is lengthened.

9.4.3.2 Multiple Inputs

Consider a typical input V (t) which is determined by a BFV vector. Without loss of
generality, we will focus on excitatory inputs in our discussions. The input consists
162 9 The Abstract Neuron Model

Fig. 9.10 Two action potential inputs into the dendrite subsystem

of a three distinct portions. First, a parabolic cap above the equilibrium potential
determined by the values (t0 , V0 ), (t1 , V1 ), (t2 , V2 ). Next, the input contains half of
another parabolic cap dropping below the equilibrium potential determined by the
values (t2 , V2 ) and (t3 , V3 ). Finally, there is the hyperpolarization phase having func-
tional form H(t) = V3 + (V4 − V3 ) tanh(g(t − t3 )). Now assume two inputs arrive
at the same electronic distance L. We label this inputs as A and B as is shown in
Fig. 9.10. For convenience of exposition, we also assume t3A < t3B , as otherwise, we
just reverse the roles of the variables in our arguments. In this figure, we note only
the minimum points on the A and B curves. We merge these inputs into a new input
V N prior to the hyperpolarization phase as follows:
9.4 Feature Vector Abstraction 163

t0A + t0B
t0N =
2
V0A + V0B
V0N =
2
t1A + t1B
t1N =
2
V1A + V1B
V1N =
2
t2A + t2B
t2N =
2
V2A + V2B
V2N =
2
This constructs the two parabolic caps of the new resultant input by averaging the
caps of V A and V B . The construction of the new hyperpolarization phase is more
complicated. The shape of this portion of an action potential has a profound effect
on neural modulation, so it is very important to merge the two inputs in a reasonable
way. The hyperpolarization phases of V A and V B are given by
 
H (t) = A
+ − V3A (V4A V3A )
tanh g (t − t3 )
A A

 
H (t) = V3 + (V4 − V3 ) tanh g (t − t3 )
B B B B B B

We will choose the 4 parameters V3 , V4 , g, t3 so as to minimize


 ∞  2  2
E= H(t) − H (t) + H(t) − H (t) dt
A B
t3A

∂E ∂E ∂E ∂E
For optimality, we find the parameters where , ,
∂V3 ∂V4 ∂g
and ∂t3
are 0. Now,
 ∞    
∂E ∂H
= 2 H(t) − H (t) + H(t) − H (t)
A B
dt
∂V3 t3A ∂V3

Further,
 
∂H
= 1 − tanh g(t − t3 )
∂V3

so we obtain
 ∞       
0= 2 H(t) − H A (t) + H(t) − H B (t) 1 − tanh g(t − t3 ) dt (9.18)
t3A
164 9 The Abstract Neuron Model

We also find
 ∞       
∂E
= 2 H(t) − H A (t) + H(t) − H B (t) tanh g(t − t3 ) dt
∂V4 t3A

as
 
∂H
= tanh g(t − t3 )
∂V4

The optimality condition then gives


 ∞      
0= 2 H(t) − H A (t) + H(t) − H B (t) tanh g(t − t3 ) dt (9.19)
t3A

Combining Eqs. 9.18 and 9.19, we find


 ∞    
0= H(t) − H A (t) + H(t) − H B (t) dt
t3A
 ∞      
− H(t) − H (t) + H(t) − H (t)
A B
tanh g(t − t3 ) dt
t3A
 ∞      
0= H(t) − H (t) + H(t) − H (t)
A B
tanh g(t − t3 ) dt.
t3A

It follows after simplification, that


 ∞    
0= H(t) − H (t) + H(t) − H (t)
A B
dt (9.20)
t3A

The remaining optimality conditions give


 ∞    
∂E ∂H
= 2 H(t) − H A (t) + H(t) − H B (t) dt = 0
∂g tA ∂g
 3∞    
∂E ∂H
= 2 H(t) − H A (t) + H(t) − H B (t) dt = 0
∂t3 t3A ∂t3

We calculate
 
∂H
= (V4 − V3 )(t − t3 ) sech g(t − t3 )
2
∂g
 
∂H
= −(V4 − V3 ) g sech g(t − t3 )
2
∂t3
9.4 Feature Vector Abstraction 165

Thus, we find
 ∞      
0= H(t) − H A (t) + H(t) − H B (t) (V4 − V3 )(t − t3 ) sech2 g(t − t3 ) dt
t3A
 ∞      
0= H(t) − H A (t) + H(t) − H B (t) (V4 − V3 ) g sech2 g(t − t3 ) dt.
t3A

This implies
 ∞      
0= H(t) − H A (t) + H(t) − H B (t) t sech2 g(t − t3 ) dt
t3A
 ∞      
− t3 H(t) − H (t) + H(t) − H (t)
A B
sech g(t − t3 ) dt
2
tA
 ∞ 
3
    
0= H(t) − H A (t) + H(t) − H B (t) sech2 g(t − t3 ) dt.
t3A

This clearly can be simplified to


 ∞      
0= H(t) − H A (t) + H(t) − H B (t) t sech2 g(t − t3 ) dt (9.21)
t3A

We can satisfy Eqs. 9.20 and 9.21 by making

(H(t) − H A (t)) + (H(t) − H B (t)) = 0. (9.22)

Equation 9.22 can be rewritten as


   
V3A + V3B
0 = V3 − + (V4 − V3 ) tanh g(t − t3 ) (9.23)
2
   
V4B − V3B V4A − V3A
− tanh g (t − t3 ) −
B B
tanh g (t − t3 )
A A
(9.24)
2 2

This equation is true as t → ∞. Thus, we obtain the identity


   
V A + V3B V4B − V3B V A − V3A
0= V3 − 3 + V4 − V3 − − 4
2 2 2

Upon simplification, we find

V3A + V3B
0 = V3 −
2
V4A + V4B
0 = V4 −
2
166 9 The Abstract Neuron Model

This leads to our choices for V3 and V4 .

V3A + V3B
V3 = (9.25)
2
V + V4B
A
V4 = 4 (9.26)
2

Equation 9.24 is also true at t = t3A and t = t3B . This gives


   
V A + V3B
0 = V3 − 3 + (V4 − V3 ) tanh g(t3A − t3 )
2
 
V − V3
B B
− 4 tanh g B (t3A − t3B ) (9.27)
2
   
V3A + V3B
0 = V3 − + (V4 − V3 ) tanh g(t3 − t3 )
B
2
 
V4A − V3A
− tanh g (t3 − t3 )
A B A
(9.28)
2

V4A −V3A V4B −V3B


For convenience, define w34
A
= 2
and w34
B
= 2
. Then, using Eqs. 9.25, 9.28
and Eq. 9.28 become
   
0 = (V4 − V3 ) tanh g(t3A − t3 ) − w34
B
tanh g B (t3A − t3B )
   
0 = (V4 − V3 ) tanh g(t3B − t3 ) − w34
A
tanh g A (t3B − t3A )

This is then rewritten as


 
w34
B
tanh g (t3 − t3 )
B A B

tanh(g(t3A − t3 )) =
(V4 − V3 )
 
w34
A
tanh g (t3 − t3 )
A B A

tanh(g(t3B − t3 )) =
(V4 − V3 )

Defining
 
w34
B
tanh g B
(t3A − t3B )
zA =
V4 − V3
 
w34 tanh g (t3 − t3 )
A A B A

zB =
V4 − V3
9.4 Feature Vector Abstraction 167

we find that the optimality conditions have led to the two nonlinear equations for g
and t3 given by
 
tanh g(t3 − t3 ) = zA
A
(9.29)
 
tanh g(t3B − t3 ) = zB (9.30)

Note that using Eqs. 9.29 and 9.30, we have


   
w34
B
tanh g B (t3A − t3B ) w34
B
tanh g B (t3B − t3A )
zA = =−
V4 − V3 w34
A
+ w34
B
   
w34
A
tanh g (t3 − t3 )
A B A
w34
A
tanh g (t3 − t3 )
A B A

zB = =
V4 − V3 w34
A
+ w34
B

Hence,

w34
B
zA > − > −1
w34
A
+ w34
B

w34
A
zB < <1
w34
A
+ w34
B

so that zA < 0 < zB . It seems reasonable that the optimal value of t3 should lie
between t3A and t3B . Note Eqs. 9.29 and 9.30 preclude the solutions t3 = t3A or t3 = t3B .
To solve the nonlinear system for g and t3 , we will approximate tanh by its first order
Taylor Series expansion. This seems reasonable as we don’t expect g(t3A − t3 ) and
g(t3A − t3 ) to be far from 0. This gives the approximate system

g(t3A − t3 ) ≈ zA (9.31)
g(t3B − t3 ) ≈ z B (9.32)

From Eq. 9.32, we find


zB
g=
t3B − t3

Substituting this into Eq. 9.32, we obtain


zB
(t A − t3 ) = zA
t3B − t3 3
168 9 The Abstract Neuron Model

This can be simplified as follows:

t3A − t3 zA
=
t3 − t3
B zB
(t3A − t3 ) zB = (t3B − t3 ) zA
t3A zB − t3B zA = t3 (zB − zA )

Thus, we find the optimal value of t3 is approximately

t3A zB − t3B zA
t3 = (9.33)
zB − zA

Using the approximate value of t3 , we find the optimal value of g can be approximated
as follows:
zB
g=
t3A zB −t3B zA
t3B − zB −zA
zB (zB − zA )
=
t3B (zB
− zA ) − (t3A zB − t3B zA )
zB (zB − zA )
= B
t3 zB − t3A zB
zB − zA
= B
t3 − t3A

Hence, we find the approximate optimal value of g is

z B − zA
g= (9.34)
t3B − t3A

It is easy to check that this value of t3 lies in (t3A , t3B ) as we suspected it should and
that g is positive. We summarize our results. Given two input BFVs, the sigmoid
portions of the incoming BFVs combine into the new sigmoid given by
 
H(t) = V3 + (V4 − V3 ) tanh g(t − t3 )
 A    
V3A + V3B V4 − V3A V B − V3A zB − zA t3A zB − t3B zA
H(t) = + + 4 tanh B t −
2 2 2 t3 − t3A zB − zA

Given an input sequence of BFV’s into a port on the dendrite of an accepting neuron

{Vn , Vn−1 , . . . , V1 }
9.4 Feature Vector Abstraction 169

the procedure discussed above computes the combined response that enters that port
at a particular time. The inputs into the dendritic system are combined pairwise; V2
and V1 combine into a Vnew which then combines with V3 and so on. We can do this
at each electrotonic location.

9.5 The Full Abstract Neuron Model

Pre-neurons can supply input to the dendrite cable at electronic positions w = 0 to


w = 4. These inputs generate an ESP or ISP via many possible mechanisms or they
alter the structure of the dendrite cable itself by the transcription of proteins. The out-
put of a pre-neuron is a BFV which must then be associated with an abstract trigger as
we have discussed in earlier chapters. The strength of a BFV output will be estimated
as follows: The area under the first parabolic cap of the BFV can be approximated
by the area of the triangle, A, formed by the vertices (t0 , V0 ), (t1 , V1 ), (t2 , V2 ). This
area is shown in Fig. 9.11. The area is given by

1
A= (V2 − V0 )(t2 − t0 )
2
The pre-neuron signals that come into the dendritic cable of the post-neuron are the
initial conditions that determined the voltage at the axon hillock of the post-neuron.
We are modeling the dendritic arbor and the soma as finite length cables. At position
λ on the cable, the voltage is given by


v̂m (λ, τ ) = A0 e−τ + An cos(αn (4 − λ))e−(1+αn )τ
2

n=1

as discussed in Peterson (2015). This voltage arrives at the far end, z = 7 in Fig. 9.12,
of the soma cable and the voltage we have at z = 0 is then the axon hillock volt-
age. This figure shows a general computational architecture for an artificial neuron
consisting of

Fig. 9.11 The EPS triangle


approximation
170 9 The Abstract Neuron Model

Fig. 9.12 Cellular agent


computations

• a dendrite system of electronic length 4. Four dendrite ports are shown labeled as
Di . Each accepts ISP and ESP inputs which are generated and modulated using
the mechanisms we have discussed.
• a soma of length 7 which contains 7 pairs simple and second messenger ports
(Gi and SMi , respectively). The simple ports Gi are where sodium and potassium
maximum conductance can be modified by direct means. The second messenger
ports SMi accept trigger T0 and generate proteins which alter the functionality of
the cell by accessing the genome.
• Three nuclear membrane gate blocks, NGi where second messenger proteins T1
can dock to initiate the transcription of a protein used in altering cell function.
9.5 The Full Abstract Neuron Model 171

• The protein block is shown differentiating into the gate proteins that are sent to the
sites Gi . Although not shown, there could also be proteins that alter Ca++ injection
currents
The axon hillock voltage is then input into the Hodgkin–Huxley equations to generate
an action potential. The voltage at z = 0 would have the form


v̂m (0, τ ) = B0 e−τ + Bn cos(7βn )e−(1+βn )τ
2

n=1

Our challenge then is two fold:


• Find a quick and efficient way to approximate the axon-hillock voltage
• Given the axon-hillock voltage, find a quick and efficient way to determine the
associated BFV.

References

M. Adams, G. Swanson, TINS neurotoxins supplement. Trends Neurosci. Supplement:S1–S36


(1996)
J. Gerhart, M. Kirschner, Cells, Embryos and Evolution: Towards a Cellular and Developmen-
tal Understanding of Phenotypic Variation and Evolutionary Adaptability (Blackwell Science,
Malden, 1997)
A. Harvey, Presynaptic effects of toxins. Int. Rev. Neurobiol. 32, 201–239 (1990)
P. Kaul, P. Daftari, Marine pharmacology: bioactive molecules from the sea. Annu. Rev. Pharmacol.
Toxicol. 26, 117–142 (1986)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models. Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015 in press)
J. Peterson, T. Khan, Abstract action potential models for toxin recognition. J. Theor. Med. 6(4),
199–234 (2006)
G. Schiavo, M. Matteoli, C. Montecucco, Neurotoxins affecting neuroexocytosis. Physiol. Rev.
80(2), 717–766 (2000)
S. Stahl, Essential Psychopharmacology: Neuroscientific Basis and Practical Applications, 2nd
edn. (Cambridge University Press, Cambridge, 2000)
G. Strichartz, T. Rando, G. Wang, An integrated view of the molecular toxicology of sodium channel
gating in excitable cells. Annu. Rev. Neurosci. 10, 237–267 (1987)
C. Wu, T. Narahashi, Mechanism of action of novel marine neurotoxins on ion channels. Annu.
Rev. Pharmacol. Toxicol. 28, 141–161 (1988)
Part IV
Models of Emotion and Cognition
Chapter 10
Emotional Models

We begin by looking at some key assumptions made in typical high level or top-down
emotional modeling. A representative of this approach is found in the work of
Sloman (1997a, b, 1998, 1999a, b) which can be summarized by the following two
statements. First, there are information processing architectures existing in the human
brain that mediate internal and external behavior. Secondly, there is a correspondence
between the high level functionality obtained from artificial and explicitly designed
architectures and naturally evolved architectures despite low level implementation
differences. The first assumption is part of many of today’s cognitive science pro-
grams while the second assumption is more problematic. One of the reasons we are
developing a software model of cognition is that we feel it will be helpful in both
elucidating our understanding of a software cognitive process and in validating this
second assumption.
It is clear this desired correspondence will depend on the right kind of abstrac-
tion of messy hardware detail. For example, we could look at an abstract wiring
diagram for a portion of the cerebral cortex (highly stylized, of course) as discussed
in chapter neural-structure and see any attempt to implement high level outlines
of software architectures that might subserve cognition or emotional modeling will
be inherently highly interconnected. It is also clear that the concept of hierarchical
organization is not very rigid in wetware as we have discussed at length in Chap. 14.
Thus implementing these architectures in software will require the ability to oper-
ate in asynchronous ways using modules whose computations are not parallelizable
across multiple nodes.
Instead, these computational objects communicate outputs and assimilate inputs
in some sort of a collective fashion. This has led us to propose the use of asynchronous
software tools as a means of coordinating many different such objects operating in
an asynchronous fashion on a heterogeneous computer network. The broad plan
of an approach to this kind of emotional modeling is given in Fig. 10.1: We can
then design characters to move autonomously in a three dimensional virtual world
and generate a real-time model of its emotions from both interactions with both its

© Springer Science+Business Media Singapore 2016 175


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_10
176 10 Emotional Models

Fig. 10.1 The emotionally


enabled avatar

environment and other characters. Such avatars will be controlled by an interacting


collection of software agents. Our discussions in previous chapters have been giving
us the background and tools to understand how to begin the process of building
such characters, although there is much yet to do. To understand the simulation
environment appropriate for emotional modeling, let’s back up and look at the study
of emotions in general. We begin with Sloman’s approach which is not based on
the underlying neurobiology at all. Sloman assumes the study of emotions can be
divided into three broad categories. These areas can be
• Semantics Based: we analyze the use of language to uncover implicit assumptions
underlying emotion
• Phenomena Based: we assume emotions are a well-specified category; we try to
correlate measurable things (physiological changes, neuronal firing rates etc.) with
emotional states.
• Design Based: we take an engineering stance and try to build a system exhibiting
phenomena that are understood as emotional attributes. We also try to find possible
mechanisms to generate the emotional outputs we seek.
We do not want to approach the study of emotions using any of these paradigms;
instead, we want emotional attributes to be a consequence of the full brain model
with communication modulated by neurotransmitters and hormones. For example, in
Rolls (2005), there is an excellent treatment of the biological approach. In fact, emo-
tions probably emerge from interactions between neural modules as is described in
Scherer (2009). Another good overview is in by Ono et al. (2003) and that of Damasio
and Carvalho (2013). The goal of Chaps. 18 and 19 which are coming up is to find
10 Emotional Models 177

Fig. 10.2 The three tower


model

Fig. 10.3 The three layer


model

tools to allow us to efficiently code these interactions. We show some simple full brain
architectures in Chap. 20 developed with these tools to give a taste of what we can do
with these tools and we show how all of the discussion in this text culminates in an
approach to a cognitive dysfunction model in Chap. 21. However, for the moment,
we think it is important to look at the broad outlines of these other approaches.
Consider the following software architecture inspired by Sloman’s computational
emotional models. A simplistic towered approach divides cognition into three sep-
arate areas as shown in Fig. 10.2: This model separates cognition into the three
areas: perception, central processing mechanisms and action methods. We don’t try
to understand how these blocks can be implemented at this point. This diagram
comes from the work of Nilsson (2001). Sloman then introduced the simple layered
model (Fig. 10.3) which is organized around reasoning concepts. A hybrid model
178 10 Emotional Models

Fig. 10.4 A hybrid combined emotional model architectures

can then be introduced that essentially uses both the towers and layers together to
give us more interacting functional subgroups. In Fig. 10.4, the indicated feedback
and feedforward paths help us to visualize how the many functional subroutines will
be combined. Finally, a full meta-management layer is added as shown in Fig. 10.5.
The meta model gives a simplistic summary of adult human information process-
ing in a modular design. Many researchers believe that such modular designs are
essential for defeating the combinatorial explosion that arises in the search for solu-
tions to complex problems. Hence, in the Sloman hybrid model, human information
processing uses three distinctly different but simultaneously active architectural lay-
ers: Reactive, Deliberative and Meta management; plus support modules for Motive
Generation, Global Alarm Mechanisms and Long Term Associative Memory Stor-
age. All of these layers can be implemented using asynchronously interacting agents.
Note how far we are from the underlying neurobiology and the approaches which
use networks of computational nodes such as neurons.
10.1 The Sloman Emotional Model 179

Fig. 10.5 Adding meta management to the hybrid model

10.1 The Sloman Emotional Model

Sloman then classifies emotions in the following way. The Reactive Layer and the
Global Alarm process provide outputs that can be classified as Primary Emotions.
These would be reactions such as being startled, being frozen with terror and
being sexually aroused. The Deliberative Layer provides Secondary Emotions.
This include things such as apprehension and relief. Sloman argues that such
responses, require a “what-if” reasoning ability. Finally, the Meta Management Layer
allows for Tertiary Emotions. These include complicated and complex responses
such as control of thought and emotion, loss of emotion, infatuations and humiliations
and thrilled anticipation. These are crucial to the absorption of culture, mathematics
and so forth. Now the crucial point is that all layers and the alarm system oper-
ate concurrently (i.e. asynchronously) and none is in total control. Sloman offers a
compelling example: we can think of longing as a tertiary emotion. Longing for
your mother requires at least that you know you have a mother, you know she is not
present, you understand the possibility of being with her and you find her absence
180 10 Emotional Models

unpleasant. All of these things require that you possess and manipulate information.
However, these conditions are not sufficient as these conditions could lead you to
regret her absence but not long for her. Note we can consider regret as an attitude
and longing as an emotion. So we clearly need more: one possibility is that longing
has the additional quality that you can’t easily put thoughts of her out of your mind.
This indicates a partial loss of control of attention which is an extraordinarily inter-
esting way to view this phenomenon of longing. This implies that to be able to lose
control of this attention means that we can sometimes control it also. There must
be some information processing mechanism which can control which information
is processed but which is not always in control. Partly losing control of thought
processes is a perturbation of our normal state.
We can infer that one way to think about tertiary emotions is that they are
perturbant states in our system. Other examples are extreme grief (you can’t think
of anything else) and extreme infatuation (your thoughts are always drawn to that
person). Our task then is to make sure that our interacting society of computational
agents can exhibit these sorts of behaviors.
We could then build our emotional models using interacting collections of com-
putational agents on a heterogeneous computer network. Some nodes of the cluster
compute emotional attributes and some nodes will drive the movement and expres-
sions of a three dimensional character or avatar that moves autonomously in a virtual
world. Silas the dog has already been implemented in this fashion by Blumberg at
the MIT Media School as early as 1997 (Blumberg 1997).
However, this model is not neurobiologically based and as such is not suitable for
our needs. We want to add biological plausibility to our models.

10.2 PsychoPhysiological Data

In a sequence of seminal papers, (Lang et al. 1998; Codispotti et al. 2001; Bradley
and Lang 2000) and (Cuthbert et al. 1996), it has been shown that people respond to
emotionally tagged or affective images in a semi-quantitative manner. Human volun-
teers were shown various images and their physiological responses were recorded in
two ways. One was a skin galvanic response and the other a fMRI parameter. Typical
results are plotted in Fig. 10.6. In this database of images, extreme images always
generated a large plus or minus response while neutral images such as those of an
infant generated null or near origin results.
If we followed the original intent and spirit of the Affective Image research, we
would like to develop data for each of nine primary emotional states as indicated
in Fig. 10.6. These would be emotional states that correspond to the following nine
locations on the two dimensional grid:
10.2 PsychoPhysiological Data 181

Fig. 10.6 Human response


to emotionally charged
picture data

2D Coordinates Physiological responses Image type


(High, High) High galvanic and high fMRI response Thrills
(High, Null) High galvanic and flat fMRI response
(High, Low) High galvanic and low fMRI response Murder
(Null, High) Flat galvanic and high fMRI response
(Null, Null) Flat galvanic and flat fMRI response
(Null, Low) Flat galvanic and low fMRI response
(Low, High) Low galvanic and high fMRI response Flowers
(Low, Null) Low galvanic and flat fMRI response
(Low, Low) Low galvanic and low fMRI responses Cemeteries

Clearly, the emotional tags associated with the images in the affective image
database are not cleanly separated into primary emotions such as anger, sadness
and happiness. However, we can infer that the center (Null, Null) state is associated
with images that have no emotional tag. Also, the images do cleanly map to distinct
2D locations on the grid when the emotional contents of the images differ. Hence,
we will assume that if a database of images separated into states of anger, sadness,
happiness and neutrality were presented to human subjects, we would see a similar
separation of response. Our hypothetical response would be captured in the emotion
triangle seen in Fig. 10.7. Indeed, in both the musical and painting compositional
domain, we will therefore design special matrices called Würfelspiel matrices for
the four positions marked in Figs. 10.6 and 10.7. Such matrices were used in the 18th
century so that fragments of music could be rapidly prototyped by using a matrix
of possibilities. We will develop such matrices for music and painting in Chaps. 11
and 12, respectively. Note, we can also use other types of emotional labels and use a
similar triangle to design other sorts of symbolic mapping data. We will be using the
emotionally labeled music and painting data to train cortical models in later chapters.
182 10 Emotional Models

Fig. 10.7 Emotionally


charged compositional data
design

References

B. Blumberg, Old Tricks, New Dogs: Ethology and Interactive Creatures, PhD thesis, MIT, The
Program in Media Arts and Sciences, School of Architecture and Planning (1997)
M. Bradley, P. Lang, Affective reactions to acoustic stimuli. Psychophysiology 37, 204–215 (2000)
T. Ono, G. Matsumoto, R. Llinás, A. Berthoz, R. Norgren, H. Nishijo, R. Tamura (eds.), Cognition
and Emotion in the Brain (Elsevier, Amsterdam, 2003)
M. Codispotti, M. Bradley, P. Lang, Affective reactions to briefly presented pictures. Psychophysi-
ology 38, 474–478 (2001)
B. Cuthbert, M. Bradley, P. Lang, Probing picture perception: activation and emotion. Psychophys-
iology 33, 103–111 (1996)
A. Damasio, G. Carvalho, The nature of feelings: evolutionary and neurobiological origins. Nat.
Rev.: Neurosci. 14, 143–152 (2013)
P. Lang, M. Bradley, J. Fitzimmons, B. Cuthbert, J. Scott, B. Moulder, V. Nangia, Emotional arousal
and activation of the visual cortex: an fMRI analysis. Psychophysiology 35, 199–210 (1998)
N. Nilsson, Teleo-Reactive Programs and the Triple-Tower Architecture. Technical Report (Robotics
Laboratory, Department of Computer Science, Stanford University, Stanford, 2001)
E. Rolls, Emotion Explained (Oxford University Press, Oxford, 2005)
K. Scherer, Emotions are emergent processes: they require a dynamic computational architecture.
Philos. Trans. R. Soc. B 364, 3459–3474 (2009)
A. Sloman, Designing human-like minds (1997a), http://www.cs.bham.ac.uk/research/cogaf. Pre-
sented at the European Conference on Artificial Life
A. Sloman, What sort of control system is able to have a personality, in Creating Personalities for
Synthetic Actors: Towards Autonomous Personality Agents, ed. by R. Trappl, P. Petta (Springer,
Heidelberg, 1997b), pp. 166–208
A. Sloman, Architectures and tools for human-like agents (1998). Presented at the European Con-
ference on Cognitive Modeling
A. Sloman, Architectural Requirements for Human-like Agents Both Natural and Artificial (What
Sorts of Machines Can Love?), in Human Cognition and Social Agent Technology, vol. 19,
Advances in Consciousness Research Series, ed. by K. Dautenhahn (John Benjamin Publishing,
Amsterdam, 1999a)
A. Sloman, What Sort of Architecture Is Required for a Human-like Agent? in Foundations of
Rational Analysis, ed. by M. Wooldridge, A. Rao (Kluwer, Boston, 1999b)
Chapter 11
Generation of Music Data: J. Peterson
and L. Dzuris

We design musical data using a grammatical approach to musical composition. This


is, of course, simplistic, but we want a mechanism for quickly and repeatedly gen-
erating large volumes of data that can be used to train the auditory cortex of the
cognitive model.

11.1 A Musical Grammar

In the literature, there are many attempts to model musical compositional designs.
Several theorists discuss music in terms of large chunks or sections and overall
function. One example is found in Caplin (1988), Classical Form: A Theory of Formal
Functions For The Instrumental Music of Haydn, Mozart and Beethoven, there is a
focus on larger musical structures such as those found in the symphony movements
of Haydn, Mozart, and Beethoven. Most of the discussion is not relevant to the design
of musical fragments of the type we desire, though there is mention of the idea of
a musical sentence. He states that the sentence is usually eight measures in length,
but that the fundamental melodic material is presented in the opening two measures.
Caplin admits that within the basic idea contained in these first measures, there
may be several distinct motives. Since motives are in and of themselves identifiable
morsels of music, we can logically compose shorter sentences than those Caplin is
analyzing.
Approaching the matter of function from the opposite direction, small units build-
ing up to larger ones, we have Davie (1953), Musical Structure and Design, who
compares musical sounds to words in terms of clauses, sentences, and paragraphs
leading to larger structures of form. He refers to musical cadences in terms of various
punctuation marks. Depending on the type used, impressions of rest, incompleteness
and surprise might be conveyed. He suggests the use of musical cadences which
are perfect and plagal for completeness (by which Caplin means V-I with soprano
on scale degree 1 or IV-I); imperfect for incompleteness (V-I with soprano on scale
degree 3 or 5) and interrupted for surprise (deceptive cadences are V or V7-VI).
© Springer Science+Business Media Singapore 2016 183
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_11
184 11 Generation of Music Data: J. Peterson and L. Dzuris

Davie goes on to examine ways to extend phrases. He gives examples of grammat-


ical clauses followed by phrases lengthened by adding adjectives, etc. The musical
examples are extended by sequential repetition, additional measures of the sequential
repetition, stretching the cadence and repetition of the stretched cadence.
These musical techniques do mimic the end result of adding adjectives in grammar.
The sentence becomes longer. We see how the comparison breaks down when we see
that an adjective further distinguishes one noun from any other noun. Repetition of
a musical fragment is merely a restatement. Sequential repetition can be compared
to restatement of a sentence by a different person (different starting pitch).
In defining a musical sentence, Davie believes it is the combination of two or more
phrases that are necessary for balance. In other words, there is a question phrase and
a response phrase. All possible three measure outcomes from our matrix will be
complete musical sentences. The opening fragment functions as a question phrase
and the closing fragment functions as the response. We have added a connecting
phrase between the two. This insertion extends our musical line, providing a smooth
transition between question/answer. In terms of speech, the exchange flows more
naturally by sounding less abrupt.
Some, such as Berry (1966), Form in Music: An Examination Of Traditional Tech-
niques Of Musical Structure and Their Application in Historical and Contemporary
Styles, believe music to be essentially abstract. However, he does believe it has the
ability to “impart a sense, a mood, impression of states or qualities”. Here, too,
cadences are considered musical equivalents of punctuation. Where Davie focuses
on scale degree and harmony (real or implied) to determine degrees of finality, Berry
goes a step further. He points out the effect cadences have on the rhythmic motion
of a line. A cadence does one of two things, either interrupts rhythmic motion or
conveys closure by stopping it all together. Berry’s explanation of smaller structural
units also supports our three measure phrases as musical sentences. He states that
a musical phrase (Berry interchanges the terms phrase and statement) may be com-
pared to a clause that contains at least a subject and a predicate (question/answer).
He goes on to say that it has a distinct beginning, a clear course of direction and an
ending (cadence).
Though not a common form of analysis, relating nouns and verbs to tonic and
dominants has been discussed by theorists. An example from Cope (1991), Comput-
ers and Musical Style, is that V7-I equals a simple verb-object motion, with the tonic
coming as a consequence to the dominant motion. This is similar to the idea of the
object of a sentence being a consequence of the verb’s action. Cope goes on to say
that the ensemble of musical pitches and chords typically relates in terms of major
and minor keys. Also, these phenomena function in relation to the key name pitch
and chord which are called “tonic” by theorists. In C major, the ensemble of notes
would be CDEFGABC. The tonic note is C and the tonic triad (three-note chord) is
CEG. There is a detailed section on parsing, which is a technique for diagramming
11.1 A Musical Grammar 185

relationships between sentence parts in language. A sentence is broken into two basic
parts, a noun phrase and a verb phrase. From there, each can be broken into smaller
pieces (article plus noun/adverb plus verb). Cope shows a parse of the C-major scale
that shows the basis for the melodic movement used in our own examples. First, he
presents the sentence as CDEFGABC. Next, he identifies noun equivalents, C (tonic)
and F (subdominant). The G (dominant) functions as a verb, leaving the articles and
their modifiers: A (Prolongation), E (Adverb), D (Prolongation) and B.
Logical motions in music are systematically described by Cope in the following
way. SPEAC equal to Statement, Preparatic, Extension, Antecedents and Consequent.
The desired line of succession then looks like this: S followed by PEA; P by SAC;
E by SPAC; A by EC; and C by SPEA Cope believes that a “generated hierarchy”
of notes is necessary to produce music that makes sense. He reasons that a random
substitution of a word type that would produce a nonsense sentence (noun–verb–
noun could equal horse grabs sky), would have the same effect in music. We agree
with Cope that tonality itself is a hierarchy where certain notes or scale degrees
have specific functions. Additionally, a set of generalities for tonal melodies is given.
Cope based this list on examples from composers across generations. These are basic
composition rules taught in your average music theory classes: use mostly stepwise
motion; follow melodic skips with stepwise motion in the opposite direction; use one
or more notes per beat agreeing with harmony; and usually begin and end on tonic
or dominant chord members. Cope summarizes by saying
With proper selection of elemental representations (i.e.,nouns are tonics) and careful coding
of appropriate local semantics, the same program that produces sentences can produce music.

Kohs (1973). Musical Form: Studies in Analysis and Synthesis stresses that the human
mind will organize music into sections based on repetition and contrast and that
distinctions exist between functional and decorative (nonharmonic) tones. Then, in
making his own connection between speech and music, Koh brings in another factor,
emotion. We will study the connections of emotion to music more completely in
Sect. 11.4.1. Kohs connects to emotional states as follows:
In prehistory, vocal melody was probably developed as a form of emotionally inflected
speech. Melody has been associated with words since the earliest times, and wordless vocal-
ization has always been a rather rare phenomenon. Thus it is not surprising that some of the
characteristics of melody are derived from speech, and that some of the melodic forms are
related to the forms of prose and poetry.

Poetry can be analyzed in terms of meter, its pattern of accented and unaccented
syllables. Musical meter is analyzed the same way; in terms of strong and weak beats.
Kohs compares nonmetrical music to the less structured rhythm that is characteristic
to prose. Again, we have a writer stress that the key to a sentence or musical phrase
is that both words and musical elements are put together based on the function each
will carry out. Kohs is explicit about music having
its own special kind of grammar and syntax. Successive tones may be grouped to form a
musical phrase having a sense of completion and unity similar to that found in a verbal
sentence. Some tones, like the verbal subject and predicate, are essential to the musical
186 11 Generation of Music Data: J. Peterson and L. Dzuris

structure; others are decorative. Suspensions, appogiaturas, neighboring tones and similar
decorations cannot stand alone without resolution any more than an adjective may stand
without a noun or pronoun. Musical phrases may be simple or complex; a short musical idea
may be expanded by a variety of means, such as parenthetical insertions, or extensions at
the beginning or the end.

He follows with an example comparison of a sentence and a musical phrase, in


which both are transformed by the addition of words and tones respectively. The
additions to the basic musical line are tried and true techniques taught to all students
of composition.
We have gleamed a primitive view of how we wish to abstract structure from music
by building on these works. Although the mapping between a grammatical view of
music and the actual way a person composes is imperfect, it pervades many of the
discussions on composition, orchestration and so forth in the musical literature. We
have thus decided that our first attempt at autonomous music creation will be based on
a simplistic grammatical approach: we will try to create a collection of short musical
fragments which embody or encapsulate a notion of good compositional style.

11.2 The Würfelspiel Approach

We will start by using an 18th century historical idea called The Musicalisches
Würfelspiel. In the 1700s, fragments of music could be rapidly prototyped by using
a matrix A of possibilities. We show an abstract version of a typical Musicalisches
Würfelspiel matrix in Eq. 11.1. It consists of P rows and three columns. In the first
column are placed the opening phrases or nouns; in the third column, are placed
the closing phrases or objects; and in the second column, are placed the transitional
phrases or verbs. Each phrase consisted of L notes and the composer’s duty was to
make sure that any opening, transitional and closing (or noun, verb and object) was
both viable and pleasing for the musical style that the composer was attempting to
achieve.
⎡ ⎤
Opening 0 Transition 0 Closing 0
⎢ Opening 1 Transition 1 Closing 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥ (11.1)
⎣ .. . . ⎦
Opening P-1 Transition P-1 Closing P-1

Thus, a musical stream could be formed by concatenating these fragments together:


picking the ith Opening, the jth Transition and the kth Closing phrases would form
a musical sentence. Since we would get a different musical sentence for each choice
of the indices i, j and k (where each index can take on the values 0 to P − 1), we
can label the sentences that are constructed by using the subscript i, j, k as follows:

Si, j,k = Opening i + Transition j + Closing k


11.2 The Würfelspiel Approach 187

Note that there are P 3 possible musical sentences that can be formed in this
manner. If each opening, transition and closing fragment is four beats long, we can
build P 3 different twelve beat sentences.
It takes musical talent to create such a Musicalisches Würfelspiel array, but once
created, it can be used in the process of learning fundamental principles of the
music compositional process. We will eventually use musical fragments which are
tagged with a specific emotional color to build a model which can assemble musical
fragments which have a specific emotional content. However, we will start with a
proof of concept using a Musicalisches Würfelspiel matrix with emotionally neutral
examples.

11.3 Neutral Music Data Design

We will start with simple compositional patterns and ideas; hence, we will be using
only quarter and half notes permitted in any of the phrases. Further, we do not want
the musical fragments to be too long, so for now each fragment consists of four
beats in 4/4 time. We will begin opening phrases and end closing or cadence phrases
on tonic C, approaching or leaving by step or tonic chord leap. Finally, the middle
phrases are centered around a third or a fifth. Using these guidelines, we created a
Musicalisches Würfelspiel array using opening phrases as nouns, middle phrases as
verbs and closing or cadence phrases as objects.

11.3.1 Neutral Musical Alphabet Design

We chose the noun phrases as shown in Fig. 11.1.


Now the last note in each of four opening phrases must be able to be played
right before any of the first notes in a middle phrase. Correct combinations are not
random choices and so the musical composer’s skill is captured to some extent in the
choices that are made for the middle phrases. Our design alphabet can be encoded as
H1 = {c, d, e, f, g, a, b, C}, and H2 = {c2 , d 2 , e2 , f 2 , g 2 , a 2 , b2 , C 2 }, where
C being the C above middle C. All the letters in H1 label quarter notes from the
middle C octave and the notes in the H2 alphabet denote half notes. Our alphabet is

Fig. 11.1 Musical opening


phrases
188 11 Generation of Music Data: J. Peterson and L. Dzuris

Fig. 11.2 Musical middle


phrases

thus {H1 , H2 } which has cardinality 16. Within this alphabet, the second opening,
ced f , can be written as the matrix
⎡ ⎤
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⎢0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0⎥
n1 = ⎢
⎣0

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0⎦
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

There are similar matrices for each of the other opening phrases. Each opening phrase
is thus a 4 × 16 matrix that has very special property: a row can only have one 1.
A given middle phrase will have a similar structure and only some of the possible
middle phrase matrices we could use will be acceptable. Our middle phrase design
choices for emotionally flat music are shown in Fig. 11.2.
In essence, when we design these two sets of matrices, we provide samples for a
hypothetical mapping from the set of 4 × 16 opening matrices to the set of 4 × 16
middle matrices. These samples provide enough information for us to approximate
the opening to middle transition function using a blend of neurally inspired architec-
tures and function approximation techniques. Finally, the cadence or closing phrases,
as shown in Fig. 11.3, were designed so that they would sound pleasing when attached
to any of the possible opening—middle phrase combinations. Again, our design gives
us a set of appropriate 4 × 16 matrices which encode valid closing phrases. These
closing phrases need to be coupled to the end notes of any middle phrase. The closing
data we design then gives us enough samples to approximate the middle to ending
transformation.
In addition, our opening data gives us four examples of starting notes for neutral
musical twelve note sequences. Now there are nine possible start notes for each
opening phrase and the fact that we do not choose some of them is important. Also,

Fig. 11.3 Musical closing


phrases
11.3 Neutral Music Data Design 189

Fig. 11.4 The neutral music matrix

each four note sequence in any of the three phrases, opening, middle and closing,
is order dependent. Given a note in any phrase, the selection of the next note that
follows is not random. The actual note sequence that appears in each phrase also
gives sample data that constrains the phrase to phrase transformations. We can use
this information to effectively approximate our mappings using excitation/ inhibition
neurally inspired architectures. Roughly speaking, if a given subset of notes are good
choices to follow another note, then the notes not selected to follow should be actively
inhibited while the acceptable notes should be actively encouraged or enhanced in
their activity. The complete Musicalisches Würfelspiel matrix, as seen in Fig. 11.4,
thus consists of four rows and three columns that will provide a total of 64 distinct
musical fragments that are intended to model neutral musical sentence design.
Currently, these fragments are rather short since they are just four beats (i.e.
L = 4) for each grammatical element. This is enough to help with the prototype
development of this stage. However, when we generate the additional Musicalisches
Würfelspiel matrices that will correspond to the other emotional shadings, we may
find it necessary to use longer note sequences (i.e. increase L) in order to capture the
desired emotional colorings. If that turns out to be true, we can easily return to this
neutral case and redesign the neutral Musicalisches Würfelspiel matrix to use longer
note sequences.

11.3.2 The Generated Musical Phrases

From our 4 × 3 Musicalisches Würfelspiel matrix, we can generate 64 musical selec-


tions of twelve beats each. In Fig. 11.5, we show the selections generated using open-
ing one, all the possible middle phrases and the first cadence phrase. The intent here is
190 11 Generation of Music Data: J. Peterson and L. Dzuris

Fig. 11.5 Neutral sequences generated using the first opening phrase, all the middle phrases and
the first ending phrase. The first column of the figure provides a label of the form x yz where x
indicates the opening used; y, the middle phrase used; and z, the ending phrase. Thus, 131 is the
fragment built from the first opening, the third middle and the first ending

that all 64 selections we so generate will be devoid of emotional content. You should
try playing these pieces on the piano and compare them to our later selections that
are purported to have happy, angry and sad overtones that are displayed as detailed
in Sects. 11.4.3, 11.4.4 and 11.4.5.
The fragments shown in Fig. 11.5 show only a few of the possibilities. For example,
the selections for opening two, verb one and all the cadence phrases are displayed
in Fig. 11.6. Again, we invite you to play the pieces. The important point is that

Fig. 11.6 Neutral sequences generated using the second opening phrase, the first middle phrase
and all the ending phrases
11.3 Neutral Music Data Design 191

all 64 pieces we can so generate using the Würfelspiel approach are equally valid
emotionally neutral choices. Hence, the Musicalisches Würfelspiel matrix we have
created captures the essence of the solution to emotionally neutral music composi-
tional design problems.

11.4 Emotional Musical Data Design

In earlier sections, we have discussed how we would design a Musicalisches Würfel-


spiel matrix which contained simple emotionally neutral musical compositions. We
now extend our construction process to include Musicalisches Würfelspiel matrices
that are sad, angry and happy. Our review of previous work has led us to the following
observations.

11.4.1 Emotion and Music

Schellenberg et al. (2000), Perceiving Emotion in Melody: Interactive Effects of Pitch


and Rhythm specifically addresses music and emotion. The researchers were trying to
decipher what effects two specific musical elements (pitch and rhythm) had upon the
perception of listeners. Before manipulating elements, they had to establish a set of
melodies that “unequivocally expressed one of three emotions: happy, sad, or scary”.
The authors decided that each of the three emotions: happy, sad, scary, is considered to
be a “basic” emotion (see e.g., Ekman and Davidson 1994). Further, based on a review
of literature cited in their paper (Hevner 1935, 1936, 1937; Kratus 1993; Slodoba
1991; Terwogt and van Grinsven 1991; Thompson and Robitaille 1992), there is a
consensus that “adults from a common culture generally show broad agreement when
associating such emotions with particular pieces of music. Other research shows that
young children have similar associations (Cunningham and Sterling 1988; Dolgin
and Adelson 1990; Giomo 1993; Kastner and Crowder 1990; Kratus 1993; Terwogt
and van Grinsven 1991).
Various attributes were used to describe melodies in the each of the three emo-
tion categories. In happy melodies, there should be fast tempi (Hevner 1936; Rigg
1940; Scherer and Oshinksky 1977; Wedin 1972); major modes (Crowder 1984,
1985; Geraldi and Gerken 1995; Gregori et al. 1996; Kastner and Crowder 1990;
Kratus 1993; Scherer and Oshinksky 1977; Wedin 1972) and staccato articula-
tion (Juslin 1997). On the other hand, in sad music, the use of lower pitches
(Hevner1936,Crowder1985,Wedin1972) along with minor modes (Crowder 1984,
1985; Geraldi and Gerken 1995; Gregori et al. 1996; Kastner and Crowder 1990;
Kratus 1993; Scherer and Oshinksky 1977; Wedin 1972) and legato articulation
(Juslin 1997), are necessary to elicit the proper tone of sadness. Finally, in scary
music, there should be a broad pitch range. Any instrument can produce staccato,
legato or broad range sound. However, in the study above, listeners associated the
192 11 Generation of Music Data: J. Peterson and L. Dzuris

staccato articulation produced by a guitar to a happy state; the legato articulation of


a violin to a sad state; and the broad pitch range of an organ to a scared state.
Meyer (1956), Emotion and Meaning in Music, examines “meaning” in music.
His arguments are based on what he labels an absolute expressionist viewpoint. This
specifically means “expressive emotional meanings arise in response to music and
that these exist without reference to the extra musical world of concepts, actions, and
human emotional states”. Expectation is a concept that Meyer considers a product that
comes from natural mental processes of perception. The process involves instinctive
grouping and organizing of information coming in through the senses. Applying
this logic to music, Meyer states that music elicits varying responses from a listener
by manipulating the expected. For example, musical progression that moves in an
irregular way throughout elicits a feeling of suspense or ambiguity for listener. Why?
Meyer claims the listener would begin to doubt the relevance of his own expectations.
This may be true for a trained musician, but we are not so sure that the average listener
is aware of having certain expectations and therefore does not consciously go through
phases of doubt. Meyer’s second argument makes more sense to us. It is the opposite
notion that if the music is so uniform or repetitive, then the music itself has ambiguity
in that it seems static, going nowhere.
Meyer links our experience of modulation (shifts in tonal center) and key changes
to departures in a narrative line in a novel. These complications of the plot function
like our extensions in musical phrases and is simply understood by the listener.
Different musical styles are complex systems of probability. This idea seems to
tie into what Cope did in later years (see Cope 1991, 2001). Cope entered many
examples of Chopin and then used a complex system of computed probabilities to
form Chopin-like pieces. We think that to know what expectation to have, based on
probability, and to understand additive elements in writing and music is due to having
encountered them before. Thus, we feel this is a human experience rather than some
intrinsic response.
In a later section, Meyer states that an expectation must “have the status of an
instinctive mental and motor response, a felt urgency, before its meaning can truly be
comprehended”. He does not go on to say how that line is crossed, but suggests that
it is the deviation of the pure tone, exact intonation, perfect harmony, rigid rhythm,
etc. which conveys emotion.
As we have seen documented in multiple sources, there is an association between
minor mode and the emotional state of sadness. Meyer adds the emotional state of
suffering as well. He reasons that these types of emotion are a product of the unstable
character of the mode itself. The unstable character refers to the fact that the minor
mode is presented in different versions: melodic minor, natural minor, and harmonic
minor. Because it is possible and likely to have a combination within one piece, it is
very chromatic. Chromaticism de-emphasizes tonal centering. Since tonal centering
is the basis of Western musical language, chromaticism seems unstable due to its
unpredictability.
Balkwill and Thompson (1999), A Cross-Cultural Investigation of Emotion in
Music: Psychophysical and Cultural Clues attempts to answer the following question:
11.4 Emotional Musical Data Design 193

Can people identify the intended emotion in music from an unfamiliar tonal system? If
they can, is their sensitivity to intended emotion associated with perceived changes in psy-
chophysical dimensions of music [defined later as any property of sound that can be perceived
independent of musical experience, knowledge, or enculturation]?

We were very interested in the results to see if our ideas for emotionally tagged
music would have characteristics that could be readily identified without cultural
constraints. In this small case study, four specific emotions (joy, sadness, anger, and
peacefulness) were presented using ragas of India. In this Hindustani system, there
is a specific raga or collection of notes for nine individual moods. Participants were
asked to rate tempo, rhythmic complexity, melodic complexity, and pitch range in
addition to the four emotions.
Balkwill and Thompson give us some preliminary expectations based on other
studies. Tempo is most consistently associated with emotional content. A slow pace
equates to sadness and, a faster one; joy. Also, simpler melodies with few variations
in melodic contour and more repetition are associated with positive and peaceful
emotions. Complex melodies with more melodic contour and less repetition are
associated with negative anger and sadness. Timbre plays a role as well. Fear and
sadness were reported more when expressed by a violin or the human voice. Finally,
timpani was associated with anger.
An interesting note they mention is that a narrower pitch range (reduced melodic
contour) may be processed as one auditory stream, therefore easy to process which
may cause positive emotional ratings. This may be linked to Meyer’s idea of an
instinctive mental or motor response.
The conclusions were that given music that was not culture-specific to the listeners,
they were forced to rely on other, psychophysical, cues to perceive emotional content.
As predicted, tempo was a strong cue used to successfully identify the ragas intended
for joy, sadness, and anger. It did not work with peacefulness. Also, ratings made by
expert and non-expert listeners were pretty equal in identifying joy and sadness. The
only significant predictor of peacefulness in this study seemed to be timbre. A flute
was highly rated as peaceful.
What happens when two real performers are put up against a computer generated
performance of a piece is exactly the focus of Clarke and Windsor (2000), Real and
Simulated Expression: A Listening Study. There is no solid conclusion in this paper,
other than the matter will need further investigation. They do state that in this study,
the simulated performance treated tempo and dynamics as elements that were corre-
lated based on principles of energy and motion. There were minute differences in the
way human performers treated repeated notes, both rhythmically and dynamically.
Hence, each performance was perceived in different ways by the listeners.
Basic emotions are defined in Juslin (1997), Emotional Communication in
Music Performance: A Functionalist Perspective and Some Data, by the following
attributes. They have distinct functions that contribute to individual survival. They
are found in all cultures and are experienced as unique feeling states. Further, they
appear early in the course of human development and are associated with distinct
autonomic patterns of physiological cues. Further, he states that most researchers
agree on at least four basic emotions: happiness, sadness, anger and fear. In this
194 11 Generation of Music Data: J. Peterson and L. Dzuris

small study, three guitarists were asked play the same melody five different ways.
One was to be without expression. This would correspond to our (null, null) or neutral
fragments. Two aspects examined were whether or not emotions could be communi-
cated to the listeners, and how the performers’ intentions affected expressive cues in
the performance (the psychophysical cues studied in Balkwill and Thompson 1999).
Like other studies in this commentary, they found that expressions of happiness,
sadness, and anger were readily identified by listeners. The fourth emotional state,
in this case fear, was a little elusive. Gender and training did not significantly effect
ones ability to identify intended emotion. The study suggests that each emotion has
certain characteristics as detailed below (the authors point out that in other instru-
ments it is typical to use staccato articulation when expressing anger, but for electric
guitarists, they uniformly revert to legato articulation for expressing anger):

Loud Quiet Fast Slow Staccato Legato


Anger x x x
Sadness x x x
Happiness x x x
Fear x x x

In Juslin and Madison (1999), The Role of Timing Patterns in Recognition of


Emotional Expression from Musical Performance, we quote from the abstract:
We gradually removed different acoustic cues (tempo, dynamics, timing, articulation) from
piano performances rendered with various intended expressions (anger, sadness, happiness,
fear) to see how such manipulation would effect a listener’s ability to decode emotional
expression. The results show that (a) removing the timing patterns yielded a significant
decrease in listeners’ decoding accuracy, (b) timing patterns were by themselves capable of
communicating some emotions with accuracy better than chance, (c) timing patterns were
less effective in communicating emotions than were tempo and dynamics.

The authors acknowledge the nature of their study as preliminary and in need of fur-
ther extended study. At any rate, a few hypotheses are put forth. The first is that long
and short note durations may be played differently depending on the intended emo-
tion. They found that expressions of happiness were played in shorter note values and
patterns in the expressions of sadness were played in longer notes. Secondly, anger
and happiness were associated with staccato articulation. This is the first instance
we encountered of further distinction between the staccato articulation representing
anger and the staccato articulation representing happiness. It was found that the anger
expressions used uniform staccato patterns where the happiness ones were more vari-
able depending on the positions within the phrase. It is suggested that more study
of this phenomenon needs to be done, as it may be a key component to decoding
happiness.
11.4 Emotional Musical Data Design 195

11.4.2 Emotional Music Data Design

The underlying goal in building each matrix was to remain as basic as possible.
We decided to work within a monophonic texture, meaning melody line only. Note
values were restricted to quarter notes and half notes in quadruple meter. Quarter
rests were also allowed, but used sparingly. All four matrices (neutral, happy, sad,
angry) are structurally similar. Each consists of three columns with four fragment
choices that are one measure in length.
Any fragment from column one from any of the matrices is designed to function
as an opening phrase. We define an opening phrase as one that clearly establishes a
tonal center. In western tonal music, the tonal focal point can be narrowed to a single
tone/pitch that is known as the tonic. We have made C our tonic note in all cases. In
all but one case, the opening fragment also established the mode as either major or
minor. The exception is made in the angry matrix, where ambiguity is desirable. All
fragments in column two of any of the matrices are designed to function as a transition
phrase. As the label implies, these transition phrases serve as connectors between
a choice from column one and a choice from column three. It is in these middle
phrases that movement away from the tonic is made or continued. This movement
is necessary for forward progress of a melody. Therefore, each transition phrase is
now highlighting a secondary pitch, one other than the tonic note established by
the opening phrase. To close our melodic lines, an ending phrase is chosen. Any
fragment from column three of any of the matrices will function in the same manner.
We designed each to move back to the tonic note in such a way as to produce a
quality of closure to our melodic lines. This was done by approaching tonic in the
most basic of ways. Using stepwise motion up or down to the tonic logically ends the
melodic journey by bringing you back to the home pitch. An alternative is to return
to tonic via melodic skip from the third or fifth scale degree. In the key of C, the
tonic (C) is scale degree 1, D 2, etc. So, a melodic skip from the third of fifth scale
degree means a skip from an E or a G note. Together with the tonic note, third and
fifth scale degrees make up a tonic chord (harmony). By using either the third or the
fifth to lead back to tonic, we again produce a sense of closure by reinstating tonic
as the final destination of our brief melodic journey.
We used the following guidelines to design our Würfelspiel matrices of differ-
ent emotional slants. When we produced emotion-deprived or neutral fragments,
individual characteristics documented by researchers as being contributing factors
of basic emotion in music were neutralized to the best of our abilities. Some of the
contributing factors, such as mode, are also essential in a plausible melody, and could
not be removed. The use of major mode is the default with a tempo of 45 beats, slow
to the point that the individual notes outweigh the overall sense of a melody. The
rational comes from reading. The telltale sign of a new reader is the slow pace during
which equal emphasis is placed on every word. A beginner will produce an emotion-
deprived reading. The same result is our goal here. Further, we use even rhythms and
exact note durations with the melody played with a basic computer generated sound.
196 11 Generation of Music Data: J. Peterson and L. Dzuris

Likewise, fragments we intended to emotionally tag as happy had individual


characteristics researched by others incorporated into the design. Each characteristic
was chosen based on a general consensus by other researchers and authors as being
a contributing factor in representing happy in music. Again, this entails choosing a
major mode, a very quick tempo (250 beats) and the use of staccato. If we wish, we
can use quarter rests. We could choose to present the melodies using a flute, as this
particular instrument has often been linked to happiness in the literature.
Once more, individual characteristics that have been researched by others were
incorporated into the design of our fragments tagged by sad. Each characteristic was
chosen based on a general consensus by other researchers and authors as being a
contributing factor in representing sad in music. There is a use of minor mode with
a slow tempo (70 beats). Also, we use slurs and legato and the bass clef to put us
in a lower register. We choose to present these melodies using a violin, as stringed
instruments are particularly lined to sadness in the literature. Finally, there is some
use of chromaticism.
To emotionally tag the fragments as angry, individual characteristics that have
been researched by others were incorporated into the design. Each characteristic
was chosen based on a general consensus by other researchers and authors as being
a contributing factor in representing “angry” in music. We use a minor mode, a
moderate tempo (180 beats) faster than used for the sad melodies, slightly slower
than the tempo used for the happy melodies and increased variation of articulation
(slurs, accents). Further, there is the incorporation of larger leaps. We choose to
present these melodies using a trumpet, as brass and percussion are often linked to
anger in the literature. There are also more repeated notes and the use of an ambiguous
fragment where the mode is not clearly established in opening phrase.

11.4.3 Happy Musical Data

Following the outline above, we have designed the Happy musical data as shown in
Fig. 11.7.
From our 4 × 3 Musicalisches Würfelspiel matrix, we can generate 64 musical
selections of twelve beats each. In Fig. 11.8, we show the selections generated using
opening one, all the possible middle phrases and the first cadence phrase.
Then we show all the selections for opening two, verb one and all the closings
in Fig. 11.9. This still only shows eight of the sixty four possible pieces, of course.
We invite you to sit at the piano and see how they sound. You should hear how that
they are distinctly happy. Our interpretation of the core meaning of a happy musical
fragment is based on ideas from two separate disciplines. First, our reading of the
relevant literature has given us guidance into the choices for notes, tempo and playing
style as outlined above; and second, the psychophysiological studies of Lang et al.
(1998) as outlined in many papers has given us a pseudo-quantitative measure of
the affective content on an emotionally charged image. Musical studies have shown
11.4 Emotional Musical Data Design 197

Fig. 11.7 The happy music matrix

Fig. 11.8 Happy sequences generated using opening one, all middle phrases and the first closing

that if a composer deliberately attempts to convey a given emotional content in their


music, queries of their audience show that the desired emotional flags have been set.

11.4.4 Sad Musical Data

Now, we move toward the design of musical data that is intended to be sad. In
Fig. 11.10, you can see the musical data that we designed to have an overall tone of
sadness. To emotionally tag these fragments as “sad,” individual characteristics such
198 11 Generation of Music Data: J. Peterson and L. Dzuris

Fig. 11.9 Happy sequences using opening two, middle phrase one and all the closings

Fig. 11.10 The sad music matrix

as the use of minor mode, a slow tempo, the use of slurs and legato, and the use of
bass clef to put us in a lower register and so forth were incorporated into the design.
From our 4 × 3 Musicalisches Würfelspiel matrix, we can generate 64 musical
selections of twelve beats each. In Fig. 11.11, we show the selections generated using
opening one, all the possible middle phrases and the first cadence phrase.
Then we show all the selections for all the openings, the first middle phrase and
the second ending in Fig. 11.12. This still only shows eight of the sixty four possible
pieces, of course. We invite you to sit at the piano and see how they sound. You
should hear a sense of sadness in each.
11.4 Emotional Musical Data Design 199

Fig. 11.11 Sad sequences using opening one, all the middle phrases and the first closing

Fig. 11.12 Sad sequences using all the openings, the first middle phrases and the second closing

11.4.5 Angry Musical Data

To emotionally tag these fragments as “angry,” individual characteristics that have


been researched by others were incorporated into the design. Each characteristic was
chosen based on a general consensus by other researchers and authors as being a
contributing factor in representing “angry” in music. These include the use of minor
mode, a moderate tempo (faster than used for the sad melodies (slightly slower than
the tempo used for the happy melodies), increased variation of articulation (slurs,
accents), incorporation of larger leaps, melody played by a trumpet, more repeated
notes and ambiguous fragment (mode not clearly established in opening phrase)
(Fig. 11.13).
200 11 Generation of Music Data: J. Peterson and L. Dzuris

Fig. 11.13 The angry music matrix

Fig. 11.14 Angry sequences generated using the first opening, all the middle phrases and the first
closing

From the angry 4 × 3 Musicalisches Würfelspiel matrix, as usual, we can generate


64 musical selections of twelve beats each. In Fig. 11.14, we show the selections
generated using opening one, all the possible middle phrases and the first cadence
phrase.
Then we show all the selections for opening two, the fourth middle and all the
endings in Fig. 11.15. This still only shows eight of the sixty four possible pieces, of
course. We invite you to sit at the piano and see how they sound. You should be able
to hear a sense of anger in each selection.
11.4 Emotional Musical Data Design 201

Fig. 11.15 Angry sequences generated using opening two, the fourth middle phrase and all the
closings

11.4.6 Emotional Musical Alphabet Selection

As you have seen in Sects. 11.4.3, 11.4.4 and 11.4.5, the emotionally tagged musical
data uses a richer set of notes and articulation attached to the notes to construct
grammatical objects. We can think of the added articulation as punctuation marks.
Slurs (one note and multiple note), staccato and marcato accents are attached to
various notes in our examples to add emotional quality. Our design alphabet can be
encoded as H = {c, d, e, f, g, a, b, r } where each note in this alphabet is now
thought of as a musical object with a set of defining characteristics. Here r is rest.
For our purposes, the attributes of a note are choices from a small set of possibilities
from the list A = { p, b, s, a}. The index p indicates what pitch we are using for
the note: −1, denotes the first octave of pitches below middle C; 0, the pitches of
the middle C octave and 1, the first octave of pitches above middle C. The letter
b tells us how many beats the note is held. The length of the slur is given by the
value of s and a denotes the type of articulation used on the note. We choose to treat
slurs as entities which are separate from the other accent markings for clarity. For
these examples, we have slurs that range from zero to three in length, so permissible
values of s are taken from the set {0, 1, 2, 3}. This could easily be extended to longer
slurs. The beat value b is either one of two as only quarter and half notes are used.
There are many possible articulations. An expanded list, for marks either above or
below a note for effect, might include neutral, no punctuation (a = 0); pizzicato, a
dot (a = 1); marcato or sforzando, a > (a = 2); staccato or portato, a −, (a = 3);
strong pizzicato, an apostrophe (a = 4); and sforzato, a ˆ (a = 5). A given note n is
thus a collection which can be denoted by n p,b,s,a where the attributes take on any of
there allowable values. A few examples will help sort out this out. The symbol d1,2,2,1
is the half note d above middle d with a pizzicato articulation which is the start of
a two note slur which ends on the second note that follows this middle d. The rest
202 11 Generation of Music Data: J. Peterson and L. Dzuris

Fig. 11.16 Some angry (a) (b)


phrases. a Middle phrase. b
Opening phrase

does not have pitch, articulation or slurring. Thus, we set the value of pitch, slurring
and articulation to 0 and use the notation r0,1,0,0 or r0,2,0,0 to indicate a quarter or
half rest, respectively. Our alphabet is thus {H} which has cardinality 8. Each letter
has a finite set of associated attributes and each opening, middle or closing phrase is
thus a sequence of 4 musical entities.
Within this alphabet, an angry middle phrase such as shown in Fig. 11.16a, can
be encoded as {e0,1,0,2 , d0,1,0,2 , c0,2,0,0 }. This would be then written as the matrix in
Eq. 11.2
⎡ ⎤
0 0 {0, 1, 0, 2} 0 0 0 0 0
n1 = ⎣ 0 {0, 1, 0, 2} 0 0 0 0 0 0⎦ (11.2)
{0, 2, 0, 0} 1 0 00000

If we had a fragment with a slur such as shown in Fig. 11.16b, this would be encoded
as {c0,1,1,0 , d0,1,0,0 , d0,1,0,2 , c0,1,0,0 }. In matrix form, we have Eq. 11.3
⎡ ⎤
{0, 1, 1, 0}0 0 0 0 0 0 0 0
⎢ 0 {0, 1, 0, 0} 0 0 0 0 0 0⎥
n1 = ⎢

⎥ (11.3)
0 {0, 1, 0, 0} 0 0 0 0 0 0⎦
{0, 1, 0, 0}0 0 0 0 0 0 0 0

These matrices indicate which musical object is used in a sequence. The four opening
phrases in a Würfelspiel music matrix can thus be encoded into matrices that are 2 × 8
(both notes in the phrase are half notes) to 4 × 8 (all notes are quarter notes). Each of
these matrices has the special property that a row can only have one nonzero entry. A
given middle phrase will have a similar structure, making only some of the possible
middle phrase matrices acceptable.
Note that encoding music in this way generates a compact data representation.
However, we need to model the data so that each possible musical entry is encoded
in a unique way. The first seven entities in H come in a total of 144 distinct states:
three pitches, two beats, four slur lengths and six articulations. The rest comes in
only two states. Hence, a distinct alphabet here has cardinality 7 × 144 + 2 or 1010.
The size of this alphabet precludes showing an example as we did with the compact
representation, but the matrices that encode these musical samples still possess the
property that each row has a single 1. Of course, with an alphabet this large in size, we
typically do not use a standard matrix representation; instead, we use sparse matrix
or linked list techniques. The data representation we used for the musically neutral
data has a much lower cardinality because the neutral data is substantially simpler.
References 203

References

L. Balkwill, W. Thompson, A cross-cultural investigation of emotion in music: Psychophysical and


cultural clues. Music Percept. 17(1), 43–64 (1999)
W. Berry, Form in music: An examination of traditional techniques of musical structure and their
application in historical and contemporary styles (Prentice-Hall, Upper Saddle River, 1966)
W. Caplin, Classical Form: A Theory of Formal Functions for the Instrumental Music of Haydn
(Oxford University Press, Mozart, 1998)
E. Clarke, W. Windsor, Real and simulated expression: A listening study. Music Percept. 17(3),
277–313 (2000)
D. Cope, Computers and Musical Style (A-R Editions, Middleton, 1991)
D. Cope, Virtual Music: Computer Synthesis of Musical Style (MIT press, Cambridge, 2001)
R. Crowder, Perception of the major/minor distinction: I. Historical and theoretical foundations.
Psychomusicology 4, 3–12 (1984)
R. Crowder, Perception of the major/minor distinction: III, Hedonic, musical and affective discrim-
inations. Bull. Psychon. Soc. 23, 314–316 (1985)
J. Cunningham, R. Sterling, Developmental changes in the understanding of affective meaning in
music. Motiv. Emot. 12, 399–413 (1988)
C. Davie, Musical Structure and Design (Dover Publications, New York, 1953)
K. Dolgin, E. Adelson, Age changes in the ability to interpret affect in sung and instrumentally-
presented melodies. Psychol. Music 18, 87–98 (1990)
P. Ekman, R. Davidson (eds.), The Nature of Emotion: Fundamental Questions (Oxford University
Press, Oxford, 1994)
G. Geraldi, L. Gerken, The development of affective responses to modality and melodic contour.
Music Percept. 12, 279–290 (1995)
A. Gregori, L. Worrall, A. Sarge, The development of emotional responses to music in young
children. Motiv. Emot. 20, 341–348 (1996)
C. Giomo, An experimental study of children’s sensitivity to mood in music. Psychol. Music 21,
141–162 (1993)
K. Hevner, The affective character of the major and minor modes in music. Am. J. Psychol. 47,
103–118 (1935)
K. Hevner, Experimental Studies of the elements of expression in music. Am. J. Psychol. 48,
246–268 (1936)
K. Hevner, The affective value of pitch and tempo in music. Am. J. Psychol. 49, 621–630 (1937)
P. Juslin, Emotional communication in music performance: A functionalist perspective and some
data. Music Percept. 14(4), 383–418 (1997)
P. Juslin, G. Madison, The role of timing patterns in recognition of emotional expression from
musical performance. Music Percept. 17(2), 197–221 (1999)
M. Kastner, R. Crowder, Perception of the major/minor distinction: IV Emotional connections in
young children. Music Percept. 8, 189–202 (1990)
E. Kohs, Musical Form: Studies in Analysis and Synthesis (Houghton Mifflin Company, Massa-
chusetts, 1973)
J. Kratus, A developmental study of children’s interpretation of emotion in music. Psychol. Music
21, 3–19 (1993)
L. Meyer, Emotion and Meaning in Music (University of Chicago Press, Chicago, 1956)
M. Rigg, The effect of tonality and register upon musical mood. J. Musicol. 2, 49–61 (1940)
G. Schellenberg, A. Krysciak, J. Campbell, Perceiving emotion in melody: Interactive effects of
pitch and rhythm. Music Percept. 18(2), 155–171 (2000)
K. Scherer, J. Oshinksky, Cue utilization in emotion attribution from auditory stimuli. Motiv. Emot.
1, 331–346 (1977)
J. Slodoba, Music structure and emotional response. Psychol. Music 19, 110–120 (1991)
M. Terwogt, F. van Grinsven, Musical expression of moodstates. Psychol. Music 19, 99–109 (1991)
204 11 Generation of Music Data: J. Peterson and L. Dzuris

W. Thompson, B. Robitaille, Can composers expression emotion through music? Exp. Stud. Arts
10, 79–89 (1992)
L. Wedin, Multidimensional scaling of emotional expression in music. Swed. J. Musicol. 54, 1–17
(1972)
Chapter 12
Generation of Painting Data: J. Peterson,
L. Dzuris and Q. Peterson

Our simple painting model will be based on a Würfelspiel matrix similar to what
we used in the above emotionally neutral music compositions. Since our ultimate
goal is to create cognitive models, it is instructive to look at the notion of creativity
from the cortical point of view. The advent of portions of cortex specialized to fusing
sensory information is probably linked in profound ways to the beginnings of the
types of creativity we label as art. Hence, models of creativity, at their base, will
involve models of visual cortex data which can then be fed into associative learning
modules. We have developed data models for the auditory cortex which are music
based in Chap. 11. We need to look at visual cortex data now. We therefore focus on
the creation of data sets that arise from paintings as we feel that this will give us the
data to train the visual input to the area 37 sensor fusion pathway.

12.1 Developing a Painting Model

Consider the two paintings, Fig. 12.1a, b, which have fairly standard compositional
designs. Each was painted starting with the background and then successive layers
of detail were added one at a time. As usual, the design elements farthest from the
viewer’s eye are painted first. The other layers are then assembled in a farthest to
nearest order. We note that in the generation of computer graphics for games, a similar
technique is used. A complex scene could in principle be parsed into individual
polygonal shapes each with its own color. Then, we could draw each such polygon
into the image plane which we see on the computer screen. However, this is very
inefficient. There are potentially hundreds of thousands of such unique polygons to
find and finding them uses computationally expensive intersection algorithms. It is
much easier to take advantage of the fact that if we draw a polygon on top of a
previous image, the polygon paints over, or occludes, the portion of the image that
lies underneath it. Hence, we organize our drawing elements in a tree format with
the root node corresponding to distances farthest from the viewer. Then, we draw a
scene by simply traversing the tree from the root to the various leafs.
© Springer Science+Business Media Singapore 2016 205
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_12
206 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

Fig. 12.1 Two simple paintings. a This painting uses many separate layers for the kelp. The
human figure is in an intermediate layer between kelp layers and the seadragons essentially occupy
foreground layers. The background varies from a dark blue at the bottom to very light blue, almost
white, at the top. b This painting uses a complicated background layer with many trees. The dragon
and the large tree are midground images, while the butterflies and moths are primarily foreground
in nature. The girl is midground

The painting seen in Fig. 12.1a started with the background. This used a gradient
of blue, ranging from very dark, almost black, at the bottom, to very light, almost
white, at the top. There are, of course, many different shades and hues of blue as
brushes are used to create interesting blending effects with the various blues that
are used. However, we could abstract the background to a simple blue background
and capture the basic compositional design element. The many kelp plants are all
painted in different planes. The kelp farthest from the viewer are very dark to indicate
distance, while the plants closest to the viewer use brighter greens with variegated
hues. We note that we could abstract the full detail of the kelp into several intermediate
midground layers: perhaps, the farthest midground layer might be one kelp plant that
is colored in dark green with the second, closest midground layer, a bright green
kelp plant. The human figure is placed between kelp layers, so we can capture this
compositional design element by placing a third midground layer between the two
midground kelp plant layers. Finally, there are many seadragons in foreground layers
at various distances from the viewer. We could simplify this to a single foreground
layer with one seadragon painted in a bright red. Hence, the abstract composi tional
design of the painting in Fig. 12.1a is as follows:
12.1 Developing a Painting Model 207

The abstract seadragons design


Layer Description
Background one Blue gradient from dark to light
Midground one Very dark green kelp plant
Midground two Human figure
Midground three Bright green kelp plant
Foreground Bright red sea dragon

In a similar fashion, we can analyze Fig. 12.1b. The background in this painting is
a large collection of softly defined trees. These are deliberately not sharply defined
so that their distance from the viewer is emphasized. The midground image is the
very large tree that runs from the bottom to the top of the painting. There are then
two more midground images: the whimsical dragon figure on the tree branch and
the human figure positioned in front of the tree. Finally, there are a large number
of Baltimore butterflies and Luna moths which are essentially foreground images.
We can abstract this compositional design as follows:

The abstract tree painting design


Layer Description
Background one Fuzzy brown trees
Midground one Large tree in brighter browns
Midground two Dragon in red
Midground three Human figure
Foreground Butterfly in black and luna moth in green

The paintings shown in Fig. 12.1a, b are much more complicated than the simple
abstract designs. However, we can capture the essence of the compositional design
in these tables. We note that, in principle, a simpler description in terms of one
background, one midground and one foreground is also possible. For example, we
could redo the abstract designs of Fig. 12.1a, b as follows:

The three element abstract seadragons design


Layer Description
Background Blue gradient from dark to light, very dark green kelp plant
Midground Human figure and bright green kelp plant
Foreground Bright red sea dragon

The three element abstract tree painting design


Layer Description
Background Fuzzy brown trees
Midground Large tree in brighter browns and dragon in red
Foreground Human figure, butterfly in black and Luna moth in green
208 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

Fig. 12.2 The background and foreground of a simple painting. a Background of a mellow image.
Note this image plane is quite complicated. Clearly, it could be broken up further into midground
and background images. b Foreground of a mellow image which is also very complicated. A simpler
painting using only one of the foreground elements would work nicely also. c A mellow painting

These new designs do not capture as much of the full complexity of the paintings
as before, but we believe they do still provide the essential details. Also, we do not
believe a two layer approach is as useful. Consider for example a two plane painting
as shown in Fig. 12.2a, b. These two planes can then be assembled into a painting as
we have described to give what is shown in Fig. 12.2c.
In this design, the foreground is too complex and should be simplified. There is
also too much in the background. Some of that should move into a midground image
so that the design in more clearly delineated.
We will therefore limit our attention to paintings that can be assembled using a
background, midground and foreground plane. The background image is laid down
first—just as an artist applies the paint for the background portion first. Then, the
midground image is placed on top of the background thereby occluding a portion
of the background. Finally, the foreground image is added, occluding even more of
the previous layers. We recognize, of course, that a real artist would use more layers
and move back and forth between each in a non-hierarchical manner. However, we
feel, for the reasons discussed above, that the three layer abstraction of the composi-
tional design process is a reasonable trade off between too few and too many layers.
Also, we eventually will be encoding painting images into mathematical forms for
computational purposes and hence, there is a great burden on us to design something
pleasing using this formalism which at the same time is sufficiently simple to use as
data in our cognitive model building process.
12.1 Developing a Painting Model 209

Our painting model thus will use a compositional scheme in which a valid painting
is constructed by three layers: background (BG), midground (MG) and foreground
(FG). A painting is assembled by first displaying the BG, then overlaying the MG
which occludes some portions of the BG image and finally adding the FG image.
The final FG layer hides any portions of the previous layers that lie underneath it.
This simplistic scheme captures in broad detail the physical process of painting. When
we start a painting, we know that if we paint the foreground images first, it will be
technically difficult and aesthetically displeasing to paint midground and background
images after the foreground. A classical example is painting a detailed tree in the
foreground and then realizing that we still have to paint the sky. The brush strokes
in the paint medium will inevitably show wrong directions if we do this, because we
can not perform graceful side to side, long brush strokes since the foreground image
is already there. Hence, a painter organizes the compositional design into abstract
physical layers—roughly speaking, organized with the background to foreground
layers corresponding to how far these elements are away from the viewer’s eye.
Recall the Würfelspiel matrix used in musical composition had the general form
below.

⎡ ⎤
Opening 0 Transition 0 Closing 0
⎢ Opening 1 Transition 1 Closing 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥
⎣ .. . . ⎦
Opening P-1 Transition P-1 Closing P-1

The building blocks of a typical musical phrase here are an Opening followed
by a Transition and then completed by a Closing. We can use this technique to
generate painting compositions by letting the Opening phrase be the Background
of a painting; the Transition, the Midground and the Closing, the Foreground.
Hence, the painting matrix would be organized as in the matrix shown in Eq. 12.1

⎡ ⎤
Background 0 Midground 0 Foreground 0
⎢ Background 1 Midground 1 Foreground 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥ (12.1)
⎣ .. . . ⎦
Background P-1 Midground P-1 Foreground P-1

This painting matrix would then allow us to rapidly assemble P3 different paint-
ings. Both of these data sets are examples of Würfelspiel data matrices that enable
us to efficiently generate large amounts of emotionally labeled data.
210 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

Fig. 12.3 A neutral painting. a Background. b Midground. c Foreground. d Assembled painting

Fig. 12.4 The neutral


painting matrix

12.2 Neutral Painting Data

With this said, we can begin to design a series of paintings that are assembled
from three pieces: the background, midground and foreground thereby generating a
Künsterisches Würfelspiel matrix approach, an artistic toss of the dice. index-
The design of emotionally labeled painting data!the neutral painting Würfelspiel
matrix!neutral data design: flat and monochromatic colors with sparse lines Our first
Künsterisches Würfelspiel matrix will consist of four backgrounds, midgrounds and
foregrounds assembled into the usual 4 × 3 matrix and we will specifically attempt
to create painting compositions that have a neutral emotional tone. We have thought
carefully about what emotionally neutral paintings assembled in this primitive fash-
ion from backgrounds, midgrounds and foregrounds should look like both as drawn
and as colored elements. We have decided on flat and monochromatic colors with
sparse lines. A typical painting composition design can be seen in Fig. 12.3a, b and c.
We can then assemble the background, middle ground and foreground elements into
a painting as shown in Fig. 12.3d. Note that this painting is quite bland and elicits no
emotional tag. In Fig. 12.4, we see a matrix of four backgrounds, four midgrounds
and four foregrounds which are assembled in a Würfelspiel fashion.
12.2 Neutral Painting Data 211

Fig. 12.5 16 Neutral compositions: background 1, midground 1, foregrounds 1–4, background


1, midground 2, foregrounds 1–4, background 1, midground 3, foregrounds 1–4, background 1,
midground 4, foregrounds 1–4

Fig. 12.6 16 Neutral compositions: background 2, midground 1, foregrounds 1–4, background


2, midground 2, foregrounds 1–4, background 2, midground 3, foregrounds 1–4, background 2,
midground 4, foregrounds 1–4

12.2.1 The Neutral Künsterisches Würfelspiel Approach

We can use this matrix to assemble 64 individual paintings. We are displaying them
in thumbnail form in the Figs. 12.5, 12.6, 12.7 and 12.8.
212 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

Fig. 12.7 16 Neutral compositions: background 3, midground 1, foregrounds 1–4, background


3, midground 2, foregrounds 1–4, background 3, midground 3, foregrounds 1–4, background 3,
midground 4, foregrounds 1–4

Fig. 12.8 16 Neutral compositions: background 4, midground 1, foregrounds 1–4, background


4, midground 2, foregrounds 1–4, background 4, midground 3, foregrounds 1–4, background 4,
midground 4, foregrounds 1–4

12.3 Encoding the Painting Data

Our abstract painting compositions are encoded as the triple {b, m, f } where b denotes
the background,
m, the midground and f the foreground layer, respectively. Each of these lay-
ers is modeled with a collection of graphical objects. Each object in such a list
has the following attributes: inside color, ci ; boundary color, cb ; and a bound-
ary curve, ∂, described as an ordered array {(xi , yi )} of position coordinates.
12.3 Encoding the Painting Data 213

Fig. 12.9 Encoding a neutral painting. a The second background image. b The background edges.
Note there are two edge curves. c The first midground image. d The midground edges. e The first
foreground image. f The foreground edges

Consider the neutral painting constructed from the second background, first
midground and first foreground image. Compute the edges of each image
layer using the Linux tool convert via the command line convert -edge 2
-negate <file in> <file out>. The -edge 2 command option extracts
the edges giving us an image that is all black with the edge in white. The second
-negate then swaps black and white in the image to give us the edges as black
curves.
For expositional convenience, we will assume all of the paintings are 100 × 100
pixels in size. We can divide this rectangle into a 10 × 10 grid which we will
call the coarse grid. Note that in Fig. 12.9b, the edge is a curve which can be

Fig. 12.10 Approximating the edge curve. a A 10 × 10 portion of a typical background image that
contains part of an edge curve. The grid from horizontal pixels 40–49 and vertical pixels 70–79 is
shown. The 10 × 10 block of pixels defines a node in the coarse grid that must be labeled as being
part of the edge curve. b A typical coarse scale edge curve in which each filled in circle represents
a 10 × 10 block in the original image which contains part of the fine scale edge curve
214 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

Fig. 12.11 Assembling a happy painting. a Happy background. b Happy midground. c Happy
foreground. d A happy painting

Fig. 12.12 The happy


painting matrix

approximated by a collection of positions {(xi , yj )}, where the index i ranges from 0
to some positive integer N, by drawing the line segments between successive points
pi (t) = t(xi , yi ) = (1 − t)(xi+1 , yi+1 ) for t in [0, 1] for all appropriate indices i. The
smaller 10 × 10 grid allows us to approximate these edge positions as shown in
Fig. 12.10a. Each of the 100 large scale boxes that comprise the 10 × 10 grid that
contain a portion of an edge curve are assigned a filled in circle in Fig. 12.10b. We see
that the fine scale original edge has been redrawn using a coarser scale. Although
there is inevitable loss of definition, the basic shape of the edge is retained.
In Chap. 11, we discussed the problem of generating neutral musical data alpha-
bets. Let’s revisit that now.
We can extend those ideas by realizing that in the context of music, part of our
alphabet must represent notes of a variety of pitches. The music alphabet included
letters which represented the standard seven notes of the octave below middle C,
middle C octave and the octave above middle C. These notes could be listed as the
matrix of Eq. 12.2 where the subscript −1, 0 and 1 on each note indicate the pitches
below middle C, of middle C and above middle C, respectively.
12.3 Encoding the Painting Data 215
⎡ ⎤
c−1 d−1 e−1 f−1 g−1 a−1 b−1
M = ⎣ c0 d0 e0 f0 g0 a0 b0 ⎦ (12.2)
c1 d1 e1 f1 g1 a1 b1

We note that there are more musical octaves that could be used and hence the
matrix M, now 3 × 7, could be as large as 7 × 7, if we wished to encode all of
the notes on a piano keyboard. If we equate the last row of M with vertical pixel
position 0 and the first row with 2, we see there is a nice correspondence between the
matrix of positional information we need for the paintings and the matrix of pitch
information we use for music. Hence, the manner in which we choose an alphabet
for the painting data is actually closely aligned with the way in which we choose the
alphabet for musical data despite the seeming differences between them.
We can also approximate the edge from the foreground, Fig. 12.9d, in a similar
fashion. These edges define closed figures in the plane under the following conditions:
first, if the first and last point of the curve are the same; and second, if the first and
last point of the curve are on the boundary of the box. For example in Fig. 12.9c,
the top edge hits the boundary of the image box on both the right and left side.
Hence, we will interpret this as two simple closed figures in the plane: first, the
object formed by including all the portions of the image box below the edge; and
second, the complement of the first object formed by the portions of the image box
above the edge. In addition, there is a second edge which is not a closed figure which
therefore does not have an inside portion.
It is clear the edges will always have an edge color and will possibly have an inside
color. We will let E pj denote an edge set. The first superscript, p, takes on the values
0, 1 or 2, and indicates whether the edge set is part of a background, midground
or foreground image. The second superscript tells us which edge set in a particular
image we are focusing on. For example, Fig. 12.9c has two edge sets. Therefore, this
background image can be encoded as the set E 0 defined by

E 0 = {E 00 , ci00 , cb00 }, {E 01 , cb01 }}

since the second edge set does not require an inside color. To complete the description
of this background image, we note that the entire image is drawn against a chosen
base color. Hence, the background image can be described fully by the set D0

D0 = {β 0 , E}

where β 0 represents the base color of the background image. Note that we do not
need to add a base color to the midground and foreground images. In general, we
have a finite number of edges in each image. Let M 0 , M 1 and M 2 denote the number
of edges the background, midground and foreground images contain. Further, the
p p
ith ordered pair of the jth edge in an image is labeled as (xij , yij ) where the value
of p indicates whether we are in a background, midground or foreground image.
216 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

We will label the cardinality of each edge set by the integers N 0j , N 1j and N 2j with
the first superscript having the same meaning as before and the second labeling which
1j
edge set we are considering. Thus, we can let Ei represent the ith ordered pair in
the jth edge in the midground. Further, we let inside and edge colors be denoted by
1j 1j
ci and ce , respectively. Also, we will assume that the paintings use a small number
of colors, here 8. We can encode a painting into the matrix shown in Eq. 12.3.

⎡ 0 0 0 0

bg {E 00 , N 00 , β 0 , ci00 , ce00 } · · · {E 0M , N 0M , ci0M , ce0M }
⎢ 1 1 1 1 ⎥
⎣ mg {E 10 , N 10 , ci10 , ce10 } · · · {E 1M , N 1M , ci1M , ce1M } ⎦ (12.3)
0 2 2 2
fg {E 20 , N 20 , ci20 , ce20 } · · · {E 2M , N 2M , ci2M , ce2M }

The encoding of any of the background, midground or foreground images will


use the alphabet we have described above. There are 100 possible (xi , yi ) coordinates
which can be painted in any of 8 colors. Each edge set in an image is thus a word
in our alphabet comprised of a sequence of letters chosen from our alphabet of 100
positional dependent letters with eight possible colors. For example, the background
image of Fig. 12.9a, is built from two words. The first word represents the first edge
set and it consists of ten letters as the edge set goes all the way across the image.
The second word denotes the second edge set and it consists of four letters.

12.4 Emotionally Labeled Painting Data

How can we capture emotional element in data that is sufficiently simple to be


encoded efficiently for auditory cortex training? We begin with a short survey of the
relevant literature.

12.4.1 Painting and Emotion in the Literature

The artist Schneider (2001), (Capturing Emotion), develops a philosophy about paint-
ing. Believing he “cannot paint feelings, but can apply paint in a manner that brings
about an idea of those feelings,” he consciously decides on particular elements to
incorporate that are going to best portray the emotion he is seeking. The first impor-
tant choice to make is the initial tone (color) laid down on a blank canvas. This would
be our background layer. Schneider says this choice immediately establishes mood
based on whether you choose a cool or warm tone. Next, he discusses edges and
values. He places a value of nine for pure black and pure white is a one value.
The degree of sharpness in the edges and the range of value used create varying
degrees of contrast. Sharp edges and full range use of values gives a harsh effect that
would not be good choices for paintings that are supposed to evoke emotions such as
12.4 Emotionally Labeled Painting Data 217

peacefulness, etc. Further contrast is established by lighting and shadows. Schneider


links emotions such as horror, anger, fear, or surprise with high contrast lighting.
It is interesting to note that an instance of spotlighting, which is of course a high
contrast setting, is thought by this painter to evoke loneliness. He explains that the
harshness of the contrast conveys isolation. Feelings such as affection, contentment,
peace and introspection are all likely to be painted with subdued contrast. Schneider
ends the article with some generalities about color. Pure colors suggest intensity
or agitation; grays suggest peace, harmony, or reflection; and combinations of the
elements addressed above will alter effects of the painting. The choices made will
either reinforce or detract from the goal emotion. It is logical then to conclude that
using soft muted colors to evoke peacefulness or contentment would be significantly
negated by inserting sharp lines. The result would probably be one of indifference,
like our neutral painting examples.
Within Pickford (1972), Psychology and Visual Aesthetics, the section titled “Feel-
ings and Emotions Expressed by Lines and Shapes,” discusses three research exper-
iments done with lines. The first was by H. Lundholm circa 1921 which focused
on distinguishing beautiful from ugly. In it, participants were asked to draw lines
they thought should fall into each category. The “beautiful” lines tended to be uni-
fied in direction or movement, had continuity, lacked angles and intersections, and
were symmetrical. The “ugly” lines were typically drawn with many irregularities
and angles. Other research, done by Poffenberger and Barrows from 1924, gave 500
subjects 18 lines, both curved and angular, to look at and then assign an adjective
from a given list. Although there were vast differences in the opinions of the 500
participants, some generalizations could be gathered from the data. Curves were
labeled as sad, quiet, and lazy. Angles were labeled as hard, powerful, and agitat-
ing. Direction of the lines further differentiated the responses: horizontal lines were
labeled quiet; downward sloping curves were sad or gentle; rising lines were merry
or agitating; and downward sloping lines were sad, weak, or lazy. In the last research
cited, Hevner from 1935, more lines were studied as well as different versions of the
same portrait. This one also incorporated the colors red and blue. Again, curves were
assigned qualities such as serene, graceful, and tender, while angles were rough,
robust, vigorous and dignified. For the colors, red was happy and exciting, while
blue was labeled serene, sad, or dignified. In a later chapter on “Associations and
Attitudes to Colour”, a variety of research experiments are cited. Consistent in all
of them are the following: red is linked to excitement, cheer, defiance, and power;
black is linked to sad, despondent, distressed, and power; blue and green are linked
to calm, secure, tender; and yellow is linked with cheerfulness and playfulness.
In his overview of expression, (Hogbin 2000, Appearance and Reality), states that
the visual elements of light and color evoke emotion depending on certain qualities
or force of marking. As examples of this, he lists the taut curve, languid movement,
and short staccato repetition. A viewer will be drawn into the intentions of the artist
if correct essential characteristics are chosen by the painter. Hogbin suggests four
emotional responses: desire (for pleasure), need (for survival), rejection (for danger),
or wonderment (for the unknown). Hogbin then discusses light, color, marks, and
line. Intensity of light will have an influence on mood. He gives several examples:
218 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

flashes of light will warn; glowing light will comfort; shadows from a spotlight
may frighten; morning light may bring hope. Colors can be subjective, with much
meaning linked to culture and association. Consider the colors blue and red. Blue may
be associated with peace and calmness, but also sadness. Red might evoke feelings
of anger and violence, but also love. More generalized statements can be made about
colors and combinations. Complementary colors, those opposite one another on a
color wheel, cause a visual vibration when placed together. This creates a more
dynamic and lively feel. Also, greater intensity of color tends to imply emotional
excitement. Use of colors near each other on the color wheel will quiet the energy
of a painting. Hogbin divides marks into three types: non-objective, abstraction, or
representational depending on the intent of the artist. Emotion enters as the viewer
attaches meaning to a mark and does one of three things: empathizes, recoils, or is
curious. Finally, lines can suggest activity or passivity as it moves from wiggly to a
curve, and ending straight.
Useful sources of information on how artists address the issue of emotion in their
work are included in the collection of writings by artists about art Goldwater (1974),
Artists on Art. Though the turn toward expressing emotion in art is often considered a
19th century change, this collection includes source references that show that artists
far earlier than that were considering how to evoke emotion within their paintings.
Leon Battista Alberti’s Kleinere Künsttheoretische Schriften, Vienna, 1877, passim.
(Collated with De pictura, Basel, 1540.) (Sch. p. 111): this 15th century artist, 1404–
1472, wrote, “A narrative picture will move the feelings of the beholder when the
men painted therein manifest clearly their own emotions....emotions are revealed
by the movements of the body”. Giovanni Paolo Lomazzo’s Trattata dell’arte della
pittura, scultura, ed architettura, Milan, 1585, (Sch. p. 359): within a 16th century
artist’s (1538–1600) definition of painting we read, “Painting is an art which...even
shows visibly to our eyes many feelings and emotions of the mind”. Nicolas Poussin’s
Correspondance de Nicolas Poussin, ed. Ch. Jouanny, Paris, 1911, Letters 147, 156.-
G.P. Bellori, Le vite de’pittori, scultori, ed architetti moderni, Rome, 1672, pp. 460–
462: here, from a 17th century (1594–1665) artist, we find mention of emotions
via discussion on form and color. “The form of each thing is distinguished by the
thing’s function or purpose. Some things produce laughter, others terror; these are
their forms,” and “Colors in painting are as allurements for persuading the eyes, as
the sweetness of meter is to poetry”.
Charles Le Brun’s Conference Upon Expression, tr. John Smith, London, 1701,
passim. (Paris, 1667) (Sch. p. 555): the artist, whose dates are 1619–1690, gives us
some specific instruction about how to portray different expressions with the face and
body language within paintings after stressing the importance of expression overall.
It is a necessary Ingredient in all the parts of Painting, and without it no picture can be
perfect...Expression is also a part which marks the Motions of the Soul, and renders visible
the Effects of Passion. Horror can be portrayed by the following suggestions, “the Eyebrow
will be still more frowning; the Eye-ball instead of being in the middle of the Eye, will
be drawn down to the under lid; the Mouth will be open, but closer in the middle than at
the corners, which ought to be drawn back, and by this action, wrinkles in the Cheeks; the
Colour of the Visage will be pale; and the Lips and Eyes something livid; this Action has
12.4 Emotionally Labeled Painting Data 219

some resemblance of Terror”, and If to Joy succeed Laughter, this Motion is expressed by
the Eyebrow raised about the middle, and drawn down next the Nose, the Ees almost shut;
the Mouth shall appear open, and shew the Teeth; the corners of the Mouth being drawn
back and raised up, will make a wrinkle in the Cheeks, which will appear puffed up, and
almost hiding the Eyes; the face will be Red, the Nostrils open; the Eyes may seem Wet, or
drop some Tears, which being very different from those of Sorrow, make no alteration in the
Face; but very much when excited by Grief.

In the Pierre-Paul Prud’hon quote from Charles Clément’s Prud’hon, sa vie, ses
oeuvres, et sa correpondance, Paris, 1872, pp. 127, 178–80, written in 18th century
Rome (1787), Prud’hon states,
...in general, there is too much concern with how a picture is made, and not enough with
what puts life and soul into the subject represented...no one remembers the principal aim of
those sublime masters who wished to make an impression on the soul, who marked strongly
each figures character and, by combining it with the proper emotion, produce an effect of
life and truth that strikes and moves the spectator....

From Gustave Coquiot’s Seurat, Paris, ca. 1924, pp. 232–33, we find that on August
28, 1890, Seurat made the following comments about aesthetics. “Gaiety of tone
is given by the dominance of light; of color, by the dominance of warm colors; of
line by the dominance of lines above the horizontal. Calm of tone is given by the
equivalence of light and dark; of color, by an equivalence of warm and cold; and of
line, by horizontals. Sadness of tone is given by the dominance of dark; of color, by
the dominance of cold colors; and by line, by downward directions”.
Expression beyond that portrayed by a human subject in a painting is referenced
in letters Matisse wrote in 1908, from The Museum of Modern Art, Henri Matisse,
ed. Alfred H. Barr, New York, 1931, pp. 29–36. (La Grand Revue, Paris, Dec. 25,
1908). Matisse says
What I am, above all, is expression...Expression, to my way of thinking, does not consist
of the passion mirrored upon a human face or betrayed by a violent gesture. The whole
arrangement of my picture is expressive. The place occupied by figures or objects, the empty
spaces around them, the arranging in a decorative manner the various elements at the painter’s
disposal for the expression of his feelings, In a picture every part will be visible and will play
the role conferred upon it, be it principal or secondary. All that is not useful in the picture is
detrimental...

Gino Severini’s Du Cubisme au Classicisme, Paris, 1921. and Ragionamenti sulle


arti figurative, Milan, 1936. discusses certain rules in art. Music is specifically named
as a similarly composed creation.
An art which does not obey fixed and inviolable laws is to true art what a noise is to a
musical sound. To paint without being acquainted with these fixed and very severe laws is
tantamount to composing a symphony without knowing harmonic relations and the rules
of counterpoint. Music is but a living application of mathematics. In painting, as in every
constructive art, the problem is posed in the same manner. To the painter, numbers become
magnitudes and color tones; to the musician, notes and sound tones.

Now our task is to take these ideas and use them to create useful emotionally
labeled painting data.
220 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

12.4.2 The Emotional Künsterisches Würfelspiel Approach

As usual, we will think of paintings as assembled from three pieces: the background,
midground and foreground. We will develop Künsterisches Würfelspiel matrices that
correspond to sad and happy emotional states.

12.4.2.1 The Happy Würfelspiel Painting Matrix

A typical happy painting (BG, MG, FG) would be constructed to give the
overall impression of a happy emotional state. An example is shown in
Fig. 12.11a, b and c. We can assemble the background, middle ground and fore-
ground elements into a painting as shown in Fig. 12.11d.
Now consider the array we get from four kinds of happy background, midground
and foreground images. In Fig. 12.12, we see the resulting matrix. The first column
of images are the backgrounds, the middle column, the midgrounds and finally, the
last column, the foregrounds. Note that this matrix has a definite emotional state
associated with it as all the images are somewhat happy—definitely not emotionally
neutral images.

12.4.2.2 The Happy Paintings

We can use this matrix to assemble 64 individual paintings. We are displaying them
in thumbnail form in the Figs. 12.13, 12.14, 12.15 and 12.16.

Fig. 12.13 16 Happy compositions: background 1, midground 1, foregrounds 1–4, background


1, midground 2, foregrounds 1–4, background 1, midground 3, foregrounds 1–4, background 1,
midground 4, foregrounds 1–4
12.4 Emotionally Labeled Painting Data 221

Fig. 12.14 16 Happy compositions: background 2, midground 1, foregrounds 1–4, background


2, midground 2, foregrounds 1–4, background 2, midground 3, foregrounds 1–4, background 2,
midground 4, foregrounds 1–4

Fig. 12.15 16 Happy compositions: background 3, midground 1, foregrounds 1–4, background


3, midground 2, foregrounds 1–4, background 3, midground 3, foregrounds 1–4, background 3,
midground 4, foregrounds 1–4

12.4.2.3 The Sad Würfelspiel Painting Matrix

As usual, a typical sad painting would be constructed from a given background,


midground and foreground image which we have designed to give the overall impres-
sion of a sad emotional state. A typical painting composition design of this sort can
222 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

Fig. 12.16 16 Happy compositions: background 4, midground 1, foregrounds 1–4, background


4, midground 2, foregrounds 1–4, background 4, midground 3, foregrounds 1–4, background 4,
midground 4, foregrounds 1–4

Fig. 12.17 Assembling a sad painting. a Sad background. b Sad midground. c Sad foreground. d
A sad painting

be seen in Fig. 12.17a, b and c. We can assemble the background, middle ground and
foreground elements into a painting as shown in Fig. 12.17d.
Now consider the array we get from four kinds of sad background, midground
and foreground images. In Fig. 12.18, we see the resulting matrix. The first column
of images are the backgrounds, the middle column, the midgrounds and finally, the
last column, the foregrounds. Note that this matrix has a definite emotional state
associated with it as all the images are somewhat happy—definitely not emotionally
neutral images!

12.4.2.4 The Sad Paintings

We can use this matrix to assemble 64 individual paintings. We are displaying them
in thumbnail form in the Figs. 12.19, 12.20, 12.21 and 12.22.
12.4 Emotionally Labeled Painting Data 223

Fig. 12.18 The sad painting matrix

Fig. 12.19 16 Sad compositions: background 1, midground 1, foregrounds 1–4, background 1,


midground 2, foregrounds 1–4, background 1, midground 3, foregrounds 1–4, background 1,
midground 4, foregrounds 1–4

We now have enough data from both music and paintings to begin the process of
training cortex in a full brain model.
224 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson

Fig. 12.20 16 Sad compositions: background 2, midground 1, foregrounds 1–4, background 2,


midground 2, foregrounds 1–4, background 2, midground 3, foregrounds 1–4, background 2,
midground 4, foregrounds 1–4

Fig. 12.21 16 Sad compositions: background 3, midground 1, foregrounds 1–4, background 3,


midground 2, foregrounds 1–4, background 3, midground 3, foregrounds 1–4, background 3,
midground 4, foregrounds 1–4
References 225

Fig. 12.22 16 Sad compositions: background 4, midground 1, foregrounds 1–4, background 4,


midground 2, foregrounds 1–4, background 4, midground 3, foregrounds 1–4, background 4,
midground 4, foregrounds 1–4

References

R. Goldwater, M. Treves (eds.), Artists on Art (Pantheon Books, New York, 1974)
S. Hogbin, Appearance and Reality (Cambium Press, Bethel, 2000)
R. Pickford, Psychology and Visual Aesthetics (Hutchinson Educational LTD, London, 1972)
W. Schneider, Capturing emotion. Am. Artist 65(703), 30–35 (2001)
Chapter 13
Modeling Compositional Design

We believe that one of the hardest problems to overcome in our attempt to develop
software models of cognition is that of validation. How do we know that our the-
oretical abstracts of cellular signaling are giving rise to network architectures of
interacting objects that are reasonable? Since we are attempting to create outputs
that are good approximations of real cognitive states such as emotional attributes,
it is not clear at all how to measure our success. Traditional graphs of ordinal data,
tables of generated events versus measured events couched in the language of math-
ematics and the data structures of computer science, while valid, do not help us to
see that the models are correct. To address this problem, we decided that in addition
to developing cognitive models that are measured in traditional ways, we would also
develop models that can be assessed by experts in other fields for validity. There are
two such models we are attempting to build: one is a model of music composition
in which short stanzas of music are generated autonomously that are emotionally
colored or tagged. The second is a model of painting composition in which primi-
tive scenes comprised of background, foreground and primary visual elements are
generated autonomously with emotional attributes.
The cognitive models are based on simplified versions of biological information
processing whose salient elements are sensory and associative cortex, the limbic
system and pathways for the three key neurotransmitters serotonin, norepinephrine
and dopamine. We will discuss some of the details of these computational modules
in the sections to come. To validate these models, we have begun by constructing
musical and painting data that will serve as samples of the cognitive output states we
wish to see from Area 37 of the temporal cortex. The associative cortex devoted to
assigning higher level meaning to musical phrases is trained with musical data built
using an 18th century approach called a Würfelspiel matrix and a simple grammar
based code. We have constructed examples of neutral, sad, happy and angry twelve
beat musical sentences which are to be the desired output of this portion of area 37

© Springer Science+Business Media Singapore 2016 227


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_13
228 13 Modeling Compositional Design

which functions as a polymodal fusion device in Chap. 11. In effect, we are using this
data, to constrain the area 17 to area 37 pathways. Similarly, the associative cortex
associated to assigning higher level meaning to a class of painting compositions
built from assembling the three layers, background, midground and foreground, is
trained from similar Würfelspiel matrices devoted to emotionally labeled painting
compositions which were developed in Chap. 12. The resulting cognitive model then
develops outputs in four emotional qualia for the separate types of inputs: auditory
(music) and visual (painting).

13.1 The Cognitive Dysfunction Model Review

The first step that we need to take in building our models is to use the musical and
painting data to constrain or train our model of the associative cortex. In general, our
model takes this specialized sensory input and generates a high level output as shown
in Fig. 13.1. In order to perform this training, we need to develop an abstract model
of information processing in the brain. We have chosen the abstraction presented
in Fig. 5.3. There is much biological detail missing, of course. We are focusing on
the associative cortex, the limbic system and a subset of neurotransmitter generation
pathways that begin in various portions of the midbrain. The architecture of the
limbic system that we will use is shown in Fig. 5.2 and the information processing
pathways we will focus on are shown in Fig. 13.2. Our cortex model itself is shown
in Fig. 5.1. The musical data provides the kind of associated output that might come
from area 37 of the temporal cortex. The low level inputs that start the creation of a
music phrase correspond to the auditory sensory inputs into area 41 of the parietal
cortex which are then processed through areas 5, 7 and 42 before being sent to the
further associative level processing in the temporal cortex. The painting data then

Fig. 13.1 The sensory to


output pathways
13.1 The Cognitive Dysfunction Model Review 229

Fig. 13.2 A generic model of the limbic system

provides a similar kind of associated input into area 37 from the occipital cortex.
Inputs that create the paintings correspond to the visual sensory inputs into area 17
of the occipital cortex which are then further processed by area 18 and 19 before
being sent to the temporal cortex for additional higher level processing. We fully
understand how simplified this view of information processing actually is, but we
believe it captures some of the principle features. In Fig. 13.3, we show how we will
use the musical and painting data to constrain the outputs of the associative cortex
and limbic system. The data we discuss here will allow us to build the model shown
in Fig. 13.4. In this model, we are essentially testing to see if the we generate the
kinds of high level meta outputs we expect. Hence, this is part of our initial validation
phase. However, a much more interesting model is obtained by inversion as shown
in Fig. 13.5. Here, we use fMRI and skin conductance inputs and starting elements
for music and painting data that have not been used in the training phase to generate
new compositional designs in these various modalities.
In Fig. 13.3, we indicate that the outputs of the limbic system of our model are
sent to the meta level pathways we use for music, painting and so forth and also to
the cerebellum for eventual output to motor command pathways.
The emotional tagged data sets contain examples of equally valid solutions to
compositional design processes. We have discussed musical data that is neutral and
emotionally tagged (Chap. 11), neutral and neutral paintings (Chap. 12). In each of
these chapters, we discuss the generation of 64 examples of solutions to compo-
sitional design tasks in different emotional modalities that are equally acceptable
according to some measure. So, can we learn from this kind of data how the experts
that designed these data samples did their job? Essentially, buried in these data sets
are important clues about what makes a great design. How do we begin to understand
the underlying compositional design principles?
230 13 Modeling Compositional Design

Fig. 13.3 The musical and painting data is used to constrain the outputs of the associative cortex
and limbic system

Each data set that is encoded into a Würfelspiel matrix, whether using music or
art, therefore contains crucial information about equally valid examples of data in
different emotional modalities. From this data, we can build mappings that tell us
which sequences of choices are valid and which are not from the perspective of the
expert who has designed the examples. In a very real sense, a cognitive model built
from this data is automatically partially validated from a psychological point of view.
13.2 Connectionist Based Compositional Design 231

Fig. 13.4 Music or painting initialization generates fMRI and skin conductance outputs and a full
musical or painting design

13.2 Connectionist Based Compositional Design

Let’s now look at how we might develop a standard connectionist approach to building
models that understand our data. We will move to an explicit neurobiological model
later. Each data set that is encoded into a Würfelspiel matrix, whether using music, art
or something else, contains crucial information about equally valid design solutions
in different modalities. From this data, we want to build mappings that tell us which
sequences of choices are valid, and which are not, from the perspective of the expert
who has designed the examples. In a very real sense, a cognitive model built from
this data is automatically partially validated from a psychological point of view. For
example, the neutral and emotionally tagged music and painting data sets contain
examples of equally valid solutions to a compositional design process in both neutral
and emotionally labeled flavors. Each set of data contains 64 examples of solutions to
compositional design tasks that are equally acceptable according to some measure.

13.2.1 Preprocessing

Each data set has an associated alphabet with which we can express noun, verb and
object units. For our purposes, let’s say that each of the nouns, verbs or objects is
a finite list of actions from an alphabet of R symbols, where the meaning of the
symbols is, of course, highly dependent on the context of our example. In a simple
232 13 Modeling Compositional Design

Fig. 13.5 fMRI and skin conductance inputs plus music or painting initialization generate a full
musical or painting design in a given emotional modality

music example, the list of actions might be a sequence of four beats and the alphabet
could be the choices from the C major scale. Thus, we will assume each of the P
nouns consists of a list of length L from an alphabet of R symbols. The raw inputs we
see as the noun vector n are thus normally processed into a specialized vector for use
in algorithms that model the compositional process. For example, in the generation of
a painting, we use our painting medium to create the background which is a process
using pigment on a physical canvas. This is the raw input n and each element of a
noun n is a letter from the alphabet. Let the letters in the alphabet be denoted by
a0 through aR−1 . Then we can associate the noun with a vector of L components,
n[0], . . . , n[L − 1] where component n[j] is a letter we will represent by ajn . The
letter ajn is then encoded into a vector of length R all of whose entries are 0 except for
the entry in slot j. In general, we can choose to encode n into a form more amenable
to computation and data processing in many ways. In our work, we have encoded
the raw data into an abstract grammar, thereby generating the feature vector N. In
general, the input nouns n are preprocessed to create output noun states denoted by
N. In a similar fashion, we would preprocess verb and object inputs to create verb and
object output states denoted by V and O, respectively. The preprocessing is carried
13.2 Connectionist Based Compositional Design 233

out by mappings fn , fv and fo , respectively. For example, the mapping from raw data
to the feature vector form for nouns is represented by Eq. 13.1:
⎡ ⎤
⎡ ⎤ N0
a0n ⎢ ⎥
⎢ ⎥ fn ⎢ N1 ⎥
n = ⎣ ... ⎦−→N =⎢ .. ⎥
n
⎣ . ⎦
aL−1
NL−1

Our association of a noun n with the feature vector N is thus but one example of the
mapping fn . In a similar fashion, we would preprocess verb and object inputs to create
verb and object output states denoted by V and O, respectively. The preprocessing
is carried out by mappings fv and fo , respectively.
Now, a primitive object in our purported compositional grammar has length L. We
can denote such an input noun object as ni and a corresponding output noun object as
Ni . Our mapping problem is thus to determine the rule behind the mapping from the
noun feature vectors N to the verb feature vectors V , gN V , and from the verb feature
vectors V to the object feature vectors O, gV O . We can express this mathematically
by Eq. 13.1.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
N0 V0 V0 O0
⎢ N1 ⎥ g ⎢ V1 ⎥ ⎢ V1 ⎥ g ⎢ O1 ⎥
⎢ ⎥ NV ⎢ ⎥ ⎢ ⎥ VO ⎢ ⎥
⎢ .. ⎥ −−→ ⎢ .. ⎥ and ⎢ .. ⎥ −−→ ⎢ .. ⎥ (13.1)
⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦
NL−1 VL−1 VL−1 OL−1

We can combine these processing steps into the diagram shown in Fig. 13.6.

Fig. 13.6 Raw sentence to feature vector processing


234 13 Modeling Compositional Design

13.2.2 Noun to Verb Processing

The data we are given is order dependent. For example, if we are given a noun, ni
of form {ai0 n
, . . . , ai,L−1
n
}, then we attend to the letters of this noun sequentially as
ai0 → ai1 → · · · → ai,L−1
n n n
. We are given that each noun ni is associated with a set
v v
of possible verbs, {vj } equal to {aj0 , . . . , aj,L−1 } for 0 ≤ j < P − 1 and one task is
thus to understand the noun to verb mapping. However, another task is to understand
how to generate the original noun sequence. Why are some noun sequences useful or
pleasing in this context and others are not? To generate a noun sequence n equal to
{a0n , . . . , aL−1
n
} means we choose a random start letter a0n and then from that preferred
sequences are generated while non-interesting words are biased against. Hence, we
think of a mapping, the Noun Generating Map or NGM as accepting an input, a0n
and generating a preferred second note a1n . Then a1n is used as an input to generate a
preferred third note, a2n and so on until the full string of letters is finished. To model
this mapping, we start by using the information about useful noun strings we have.
n n
Given letter ai0 , we know that ai1 is preferred.
n
We embed the original data into an analog vector by converting each letter ai0 of
the noun, which is a 0 or a 1 in our initial encoding, into a real number ξi0 . To set the n

value of the real number ξi0 n


, we choose a thresholding tolerance, , in the interval
(0, 0.25) and choose a real number y randomly from the interval [−0.5, 0.5] and
then set the value of ξi0 to be  ± y. Therefore, the value of ξi0 n
lies in the interval
[0.5, 1.5]. For example, if  was chosen to be 0.20, then for all indices in the binary
encoding of the letter ai0 n
that are 0, we would randomly choose y from [−0.1, 0.1]
generating ξi0 values that lie in [0.10, 0.30]. We will call the number  our analog
n

threshold. The entry in the binary encoded letter that corresponds to a 1 will be
randomly chosen from [1 − 1.5, 1 − 0.5]. Hence, for  = 0.2, entry with a 1 will
be assigned a real number in the interval [0.7, 0.9]. Consequently, the raw binary
noun, verb and object data is mapped into a new analog representation in which each
entry is a real number chosen as above.
n
Since we know that only certain letters should follow ai0 , only certain analog
states ξi1 are permissible given a start state of ξi0 . We infer from this that there is
n n

an unknown mapping h01 which maps the analog encoding of letter 0 to the analog
encoding of letter 1, h01 (ξi0 n
) = ξi1
n
. This mapping has special characteristics: our data
tells us that only certain letters that can follow letter 0. Each acceptable second letter
is a vector in R dimensional space whose components are analog zero except the one
the corresponds to the second letter. That component is an analog one. We have at
most P examples of acceptable second letters. This means we have R − P second
letters that are not acceptable. In other words, the preferred output for a given noun
is a R row matrix formed from the acceptable verbs that has at most P columns.
We can do this for all of the nouns in our data set. Hence, we will have at most P first
letter choices and each of these will have at most R − P unacceptable second letters.
Let T and T  denote the set of all acceptable and unacceptable outputs respectively.
Following a traditional machine learning approach, we could model the mapping
h01 as a chained feed forward network, with feedforward and feedback connections
13.2 Connectionist Based Compositional Design 235

between artificial neurons. This mapping takes the first letter of a noun and outputs a
set of acceptable second letters. Training is done by matching input to output using
excitation and inhibition (i.e., using a Hebbian approach). We know which elements
in the analog output vector should be close to one and which should be close to zero
for the analog input.
We initialize all of the tunable parameters to be small positive if the connection
from component k in the input to component j on the output is between two analog
ones. All other connections are initialized to small negative numbers. For each first
letter we have data for, we do the following: pick the initial first letter in our data set
and compute the relevant output for the first associated second letter. Increase the
connection weights on any path between a high input and a high output and decrease
the connection weight on any other paths. Cycle to the next second letter and redo
until all possibilities are exhausted. We thus continue this process until every input
generates an output with a high component value in the location that corresponds to
the index for the second letter. At this point, we say we have trained our nonlinear
mapping h01 so that first letters in our data noun sequences are biased to connect
to their corresponding second letters. A second letter is then chosen randomly from
the set of acceptable second letters via an additional input line which is a sense is a
coarse model of creativity.
If we let the set of all the generated weights be the matrix W 01 , we note this is an
R × PR size matrix. We can develop a similar mapping for the second to third letter,
h12 with weights W 12 , the third to fourth letter, h23 with weights W 23 , and finally,
the mapping from letter L − 1 to letter L, hL−2,L−1 with weights W L−2,L−1 .
The procedure for creating a valid noun sequence can now be given. Choose
a valid starting letter for a noun, ai0 n
and we map it to its analog form, ξi0 n
. Then,
applying the first to second letter map, we find an acceptable second letter by the
computation h01 (ξi0 n
) = {ξi1n
}. This second letter can be used as the input into the next
map, generating an acceptable third letter. Hence, the composite map, h12 h01 takes a
valid first letter and creates the three letter analog sequence defined by Eq. 13.2:
⎡ ⎤ ⎡ ⎤
ξi0
n
ξi0
n
⎣ ξi1
n ⎦
∈ ⎣ h01 (ξi0
n
) ⎦ (13.2)
ξi2
n
h (h (ξi0 ))
12 01 n

The analog sequences are then mapped into three letter sequences by assigning an
analog value to either a one or zero using a threshold tolerance τ . This means we
map a component whose value is above τ to 1 and one whose value is below τ to
0. This can of course generate invalid sequences as we are only supposed to have
a single 1 assigned from any analog sequence. We do have to make sure that our
developed map does not allow this. For example, for τ = .6, the vector
⎡ ⎤ ⎡ ⎤
0.83 1
τ =0.6
−−→ ⎣ 0 ⎦
⎣ 0.55 ⎦ −
0.35 0
236 13 Modeling Compositional Design

generates a valid noun, but if τ = .5, an invalid binary sequence is generated.


⎡ ⎤ ⎡ ⎤
0.83 1
τ =0.5
⎣ 0.55 ⎦ −
−−→ ⎣ 1 ⎦
0.35 0

which we would not know how to interpret as part of a noun. Nevertheless, despite
these obvious caveats, the procedure above learns how to generate all acceptable three
letter nouns given an initial start letter. There will be at most P2 possible second and
third letters in this set. Since this set of possibilities will grow rapidly, after generating
the letter two set of possibilities, we randomly choose one of the columns of the letter
two matrix as the second letter choice and apply the h12 mapping to that letter. We
then randomly choose one of the columns of the letter three matrix as the third letter
choice.
We then extend this procedure to the generation of all P letters with concatenation
hP−2,P−1 · · · h12 h01 which we will denote by the symbol H n , where the superscript
indicates this is the mapping we will use for noun sequences. The mapping H n is
the Noun Generating Map or NGM that we seek. It generates a set of PP−1 letter
two to letter P sequences. By making a random column choice at each letter, we
generate one random P letter noun sequence for each initial letter we use. We can
do something similar for the verb and object data generating the Verb and Object
Generating Maps H v and H o , respectively. These three mappings are the noun, verb
and object generator mappings we were seeking. Then we need to connect nouns to
verbs and verbs to objects. The mapping from N to V is where the real processing lies.
The Würfelspiel matrix training data approach tells us that it is permissible for certain
nouns to be linked to certain verbs. While we could memorize a look-up table based
on this data, that is not what we wish to do. We want to determine underlying rules
behind these associations as emergent behavior in a complex system of interacting
agents. Thus, each output noun Ni in the the collection of P nouns {Ni : 0 ≤ i < P}
should activate any of the P verbs {Vj : 0 ≤ j < P} via the map gN V . Further, each
verb Vj in {Vj : 0 ≤ j < P} should activate the output nouns {Oi : 0 ≤ i < P} by the
action of the map gV O . To build the mappings gN V and gV O , we will eventually use
a more sophisticated model which is graph based with feedback which is another
discussion entirely.

13.2.3 Sentence Construction

There are then two ways to create a valid emotionally labeled music or painting
composition. The first does not use the mappings gN V and gV O . For a painting, a
random choice of background letter is chosen to begin the sentence selection process.
This input generates a valid background image. Then, a randomly chosen starting
letter for the midground is then used to generate a valid midground image. Finally,
a random start letter for the foreground generates the last foreground image from
13.2 Connectionist Based Compositional Design 237

which the painting is constructed. The output of the cognitive module is thus a short
sentence, that is painting, of the type we have discussed. The second method is more
interesting. The randomly generated noun N generates a valid verb gN V (N) and the
valid object is generated by the concatenation gV O (gN V (N)). Thus, the composite
map gV O gN V provides a sentence generator.
Once we can generate sentences, we note that we can move to the generation
of streams consisting of sentences concatenated to other sentences after we create
an Object to Noun mapping. This is done in a way that is similar to what we have
done before using a Würfelspiel array approach. Other possibilities then come to
mind; for example, in this painting context, such transitions can create arbitrarily
long visual streams we can call animations. The analog of key changes in music is
then perhaps visual scene changes. Since key changes are logical transitions, we can
create arbitrarily long musical streams punctuated by appropriate key changes by
using the Würfelspiel array approach to model which key changes between given
key signatures are pleasing between two musical streams. Note, this construction
process has all the problems of a traditional connectionist approach. The gN V , gV O
and gNO maps depend on our data and would have to be rebuilt when new data is
used. We believe it will be much better to replace this connectionist approach with a
biologically based simple brain model.

13.3 Neurobiologically Based Compositional Design

We can see that in principle, there is a lot that can be accomplished by building
the maps we have discussed above using any architectures we desire. However, we
would like our connectionist architecture choices to be based on what we know of
neurobiology in either humans or other creatures with interesting neural systems
such as spiders, honeybees and cephalopods.

13.3.1 Recalling Data Generation

In the 1700s, fragments of music could be rapidly prototyped by using a matrix of


possibilities called a Würfelspiel matrix. The historical examples were constructed
as 10 × 10 matrices where each entry was a 16 beat sequence. The musician would
toss a die to determine which choice to use from each column. Hence, a musical
Würfelspiel matrix could be used to rapidly prototype 1000 160 beat samples of a
chosen musical type. In essence, a Würfelspiel music matrix captures in an abstract
and succinct form the genius of the artist. We show an abstract version of a typical
Musikalisches Würfelspiel matrix, M, in Eq. 13.3. It consists of P rows and three
columns. In the first column are placed the opening (O) phrases or nouns; in the third
column, are placed the closing phrases (C) or objects; and in the second column,
are placed the transitional phrases (T) or verbs. Each phrase consisted of L beats
238 13 Modeling Compositional Design

and the composer’s duty was to make sure that any opening, transitional and closing
(or noun, verb and object) was both viable and pleasing for the musical style that
the composer was attempting to achieve. Thus, a musical stream could be formed
by concatenating these fragments together: picking the ith Opening (Oi ) , the jth
Transition (Tj ) and the kth Closing (Ck ) phrases would form a musical sentence.
Note that there are P3 possible musical sentences that can be formed in this manner.
If each opening, transition and closing fragment is four beats long, we can build P3
different twelve beat sentences.
⎡ ⎤ ⎡ ⎤
O0 T0 C0 B0 M0 F0
⎢ O1 T1 C1 ⎥ ⎢ B1 M1 F1 ⎥
⎢ ⎥ ⎢ ⎥
M = ⎢. . . ⎥ P = ⎢. .. .. ⎥ (13.3)
⎣.. .
. .
. ⎦ ⎣. . . . ⎦
OP−1 TP−1 CP−1 BP−1 MP−1 FP−1

Further, a simple painting can also be considered as a different sort of triple; it consists
of a background (B), a midground (M) painted on top of the background and finally,
a foreground (F) layer which further occludes additional portions of the combined
background and midground layers. Hence, paintings could be organized as a matrix
P as shown in Eq. 13.3 also. Thus, a painting could be formed by concatenating these
images together: picking the ith background (Bi ), the jth midground (Mj ) and the kth
foreground (Fk ) phrases would form a painting Bi Mj Fk . Note that there are again P3
possible paintings that can be formed. A sample music matrix is shown in Fig. 11.13
and a painting matrix in Fig. 12.12.
Clearly, each data set encoded into a Würfelspiel matrix therefore contains crucial
information about equally valid examples of data that are solutions to a certain design
problem. The existence of this data means that mappings exist that tell us which
sequences of choices are valid and which are not from the perspective of the expert
who has designed the examples. Each data set therefore has an associated alphabet
with which we can express noun, verb and object units. For our purposes, let’s say
that each of the nouns, verbs or objects is a finite list of actions from an alphabet
of R symbols, where the meaning of the symbols is of course highly dependent on
the context of our example. In a simple music example, the list of actions might be
a sequence of four beats and the alphabet could be the choices from the C major
scale. Thus, we will assume each of the P nouns consists of a list of length L from
an alphabet of R symbols. Since the general alphabet has R symbols, each element
of a noun can be thought of as a vector of size R whose components are all 0 except
for a single 1. Let the letters in the alphabet be denoted by a0 through aR−1 . Then
a noun has components 0 to L − 1 where component n[j] is the letter ajn . The letter
ajn is then encoded into a vector of length R all of whose entries are 0 except for the
entry in slot j.
13.3 Neurobiologically Based Compositional Design 239

Fig. 13.7 A three cortical


column model

13.3.2 Training the Isocortex Model

Consider a typical multiple column portion of our cortex model, say Area 17, 18 and
19. We could think of it as shown abstractly in Fig. 13.7. In this picture, we show a
stack of three cortical columns. At the top of the figure, we show the neurotransmitters
input from specialized midbrain areas. In our model, we are limiting our attention
to serotonin from the Raphe nuclei, norepinephrine from the Locus Coeruleus and
dopamine from the dopamine producing neurons. At the bottom, we indicate that
information flow through the cortical columns the thalamic control circuitry.
The situation is of course much more complicated and you must remember that
the arrows designated as OCOS and FFP actually refer to abstractions of specialized
groupings of neurons as shown in earlier figures as shown in Sect. 5.4.1. Let’s assume
that we have obtained auditory and visual cortex data (for example, the musical and
painting data matrices M and P previously discussed) which are encoded into feature
vectors of our design using an abstract grammar. In our own physiology, raw sensory
input is also coded into progressively more abstract versions of the actual signals.
Hence, we view our chosen algorithms for computing the feature vector of the given
data as the analog of that process. These feature vectors are then used as the inputs
into layer four of the bottom cortical column corresponding to each sensory modality.
The powerful OCOS and FFP circuits then serve as devices that do feature grouping.
The outputs of associative cortex areas such as area 20, 21, 22 and 37 of the temporal
cortex would then be constrained to match the known outputs due to our data.
To build the information processing models we are discussing thus requires that we
understand the input to output maps that are based on the Würfelspiel data generated
240 13 Modeling Compositional Design

for our given sensory modalities. As discussed above, this data will be used to train
the auditory and visual cortex in preparation for the training of the associative cortex.
The trained model of the associative cortex provides use with a neurobiologically
motivated model of sensor fusion. To understand this process better, we consider the
details of how a model of sentence construction for a given abstract grammar works.
This show quite clearly the perceptual grouping tasks that must be accomplished with
the OCOS and FFP training algorithms applied to the cortical columns of Fig. 5.24.
Recall, the raw data n, v and o is encoded into feature vectors N, V and O, respectively.
The preprocessing is carried out by mappings fn , fv and fo , respectively. Now, a
primitive object in our compositional grammar has length L. Our mapping problem
is thus to determine the rule behind the mapping from the noun feature vectors N to
the verb feature vectors V , gN V , and from the verb feature vectors V to the object
feature vectors O, gV O . We can express this mathematically by Eq. 13.1. To generate
a noun sequence ni equal to {ai0 n
, . . . , ai,L−1
n
} means we choose a random start letter
and then from that preferred sequences are generated while non-interesting words are
biased against. Hence, we think of a mapping, the Noun Generating Map or NGM
as accepting an letter input, a0n and generating a preferred noun. Note if we start
with a0n , we then need to generate a preferred second letter, a1n . Then a1n is used as
an input to generate a preferred third note, a2n and so on until the full string of letters
is finished. The verb and object data provide us with samples of a verb and object
generating map, VGM and OGM also. To model this mapping, we start by using the
information about useful noun strings we have.
Since we know which letters are preferred as the next letter given a starting choice,
we know that we have at most P first letter choices and each of these will have at most
R − P unacceptable second letters. Let T and T  denote the set of all acceptable and
unacceptable outputs respectively. As we noted earlier, we could model this mapping
as a chained feed forward network, with feedforward and feedbackward connections
between artificial neurons using standard training techniques. This mapping would
take the first letter of a noun and output a set of acceptable second letters. However,
we would need to build such a mapping for each of the letter to letter transitions and
this is certainly computationally messy and even if it is workable for the nouns, it
is still not at all like the circuitry we see in Fig. 5.24. Hence, we will instead choose
to think of this mapping as a cortical column circuit whose outputs are produced
by OCOS and FFP interactions as described in Grossberg (2003). That is, the noun
inputs aijn go into layer 4 of the bottom cortical column of Fig. 5.24 and the OCOS
and FFP circuits then allow us to adjust the strengths of connections between the
neurons in the column so that we obtain the groupings of letters that are preferred in
the nouns of our sample. For convenience of exposition, we note the cortical column
for nouns can be labeled using the three six layer cortical groups, G n0 , G n1 and G n2 .
Further, let G ni denote the th in layer G ni . Then, we have the training sequence

G 4n0 → G 2n0 → G 4n1 → G 2n1 → G 4n2 → G 2n2

where each arrow denotes perceptual grouping training using OCOS and FFP laminar
circuits. The output from layer 2 of the top cortical group of the noun column thus
13.3 Neurobiologically Based Compositional Design 241

consists of the preferred letter orders for nouns. We handle the verb and object data
for groups G vi and G oj in a similar way using the training sequences

G 4v0 → G 2v0 → G 4v1 → G 2v1 → G 4v2 → G 2v2


G 4n0 → G 2o0 → G 4o1 → G 2o1 → G 4o2 → G 2o2 .

To generate the preferred Noun to Verb and Verb to Object mappings, we use two
more cortical columns with the groupings G N V i and G V Oj . We then use the training
sequences

G 2n2 , G 2v2 → G 4N V 0 → G 2N V 0 → G 4N V 1 → G 2N V 1 → G 4N V 2 → G 2N V 2
G 2V 2 , G 2o2 → G 4V O0 → G 2V O0 → G 4V O1 → G 2V O1 → G 4V O2 → G 2V O2

The output from Noun–Verb cortical column is a preferred verb for a given noun and
the output from the Verb–Object cortical column is a preferred object for a given
verb. We thus have five cortical column stacks whose bottom column handles input
sentence parts and outputs a grammatically correct following part of the growing
sentence. At this point, we have the noun, verb and object generating maps, NGM,
VGM and OGM, that we have sought, as well as instantiations of Noun to Verb and
Verb to Object mappings.

13.3.3 Sensor Fusion in Area 37

A given set of sensory data encoded into a Würfelspiel matrix using a rudimentary
grammar based on an abstraction of the signal can be used to imprint a five cortical
column model of isocortex. Let’s assume we have data from two sensory modalities,
A and B. We posit an information processing architecture which consists of the four
mutually interacting agents: isocortex imprinted by sensory data of type A, CA;
isocortex imprinted by sensory data of type B, CB; an isocortex Area 37 module,
A37, to be imprinted by the expected outcome due to the fusing of information
from CA and CB; and an isocortex limbic module, LM, whose purpose is to provide
modulatory influence to the inputs of the other three modules. Using notation similar
to what we have used previously, the training sequence, suppressing internal details,
is shown in Eq. (13.5).

Sensory A input → G 4CA0 → G 2CA2


Sensory B input → G 4CB0 → G 2CB2
G 2CA2 , G 2CB2 → G 4A37,0 → G 2A37,2 (13.4)
G 2CA2 , G 2CB2 , G 2A37,2 → G 4LM0 → G 2LM2
G 2LM2 → G 4CA0 , G 4CB0 , G 4A37,0
242 13 Modeling Compositional Design

13.4 Integration of the Models

We can thus construct mappings that encode the compositional designs from our
painting and music data. This suggests some basic principles: if we use the super-
scripts α and β to denote emotional modality and data type, respectively, we can label
αβ αβ
the mappings in the form {gN V , gV O }. We let α = 0, 1, 2, 3 denote neutral, happy, sad
and angry and β = 0, 1 indicate the choice of music and painting designs, respec-
tively. We thus have a collection of mappings

{gN00V , gV00O } {gN01V , gV01O } {gN02V , gV02O } {gN03V , gV03O } Music
{gN10V , gV10O } {gN11V , gV11O } {gN12V , gV12O } {gN13V , gV13O } Painting

The high level cortex processing is shown in Figs. 13.8 and 13.9. A simplified version
of the auditory and visual cortex to associative cortex processing is illustrated. Only
some of the interconnections are shown. Each cortex type is modeled as four layers
which are associated with the specific time frames of milliseconds (10−3 s), seconds,
days (103 s) and weeks (106 s). Each time frame connects feedforward and feedback
to all time frames with longer time constants (i.e. the millisecond time frame outputs
project to the second, day and week time frames). The associative cortex processing
is assumed to be output via the four time frames of the temporal cortex. We show
very few of these connections as the diagram quickly becomes difficult to read.
However, the frontal, parieto-occipital and temporal cortex modules are all fully
interconnected. The portions of these diagrams labeled as frames of specific time

Fig. 13.8 A simplified version of the auditory cortex to associative cortex processing. Only some
of the interconnections are shown. Each cortex type is modeled as four layers which are associated
with specific time frames. The different processing time frames are labeled −3, 0, 3, and 6 for
millisecond, seconds, days and weeks. The temporal cortex time frames shown connect to limbic
system represented by the cingulate gyrus
13.4 Integration of the Models 243

Fig. 13.9 A simplified version of the visual cortex to associative cortex processing. Only some of
the interconnections are shown. Each cortex type is modeled as four layers which are associated
with specific time frames. The different processing time frames are labeled −3, 0, 3, and 6 for
millisecond, seconds, days and weeks. The temporal cortex time frames shown connect to limbic
system represented by the cingulate gyrus

constants are functionally equivalent to the different levels of cortical processing


discussed in the isocortex models. Specifically, the millisecond time frame is the
cortical structure closest to raw sensory input, the second time frame is the cortex
module the first one passes information to and so forth.
A few of these connections are shown in Figs. 13.8 and 13.9, in which we only
see connections illustrated between the different time frames of the parieto-occipital
and temporal lobes. There are simple one line double arrows drawn for the frontal to
temporal and frontal to parieto-occipital to indicate the other connections. For each
emotional modality, our musical and painting Würfelspiel data provides 64 equally
valid data points. Consider the 64 sad data points for music. We know as humans that
this data is sad. Each such sad data point provides auditory cortex training data and
64 examples of the sad emotional attribute. The painting data gives us 64 examples of
visual cortex training data as well as 64 additional sad emotional attributes. Hence,
we have 128 examples of sadness split between the sensory pathways of hearing and
vision. In this way, we build 128 examples of each emotional attribute from music
and painting split equally between hearing and vision. We know from studies of
neural processing, that the front of the auditory and visual cortex is closely aligned to
sensory data and as you progress into more interior layers of cortex, neural ensembles
begin to respond to progressively higher and more abstract patterns. For example, in
the auditory cortex, initially the nerve cells respond to simple phonemes of perhaps
20 mS duration and higher levels are responsive to words, then sentences and so
forth. We can make similar analogies to processing in the visual cortex. Outputs
from primary sensory cortex are fed into higher level associative cortex where more
244 13 Modeling Compositional Design

abstract processing is performed. Hence, our data provides a validating pathway for
two types of primary sensory cortex as well as a primitive model of higher level
associative cortex.

13.5 Depression Models

We can use the developed models to model forms of depression following the
monoamine—neurokinin and BDNF hypothesis of dysfunction. In the monoamine
hypothesis, we assume that depression is due to a deficiency of the monoamine
neurotransmitters norepinephrine NE and serotonin 5HT. Since we are dealing with
monoamine NTs, all of the machinery needed to construct new NT on demand, break
it down into components for reuse and transport it back from the cleft into the pre-
synapse via re-uptake pumps is available in the pre-synaptic cell. The enzyme used
to break down the monoamine is called monoamine oxidase or MAO. In a malfunc-
tioning system, there may be too little monoamine NT, too few receptors for it on the
post-synapse side, improper breakdown of the NT and/or improper re-uptake. If MAO
inhibitors are given, the breakdown is prevented, effectively increasing monoamine
concentration. If tricyclic compounds are given, the re-uptake pump is blocked, also
effectively increasing the monoamine concentration.
Another hypothesis for the etiology of depression is that there is a malfunction
in one or more second messenger systems which initiate intracellular transcription
factors that control critical gene regulation. A candidate for this type of malfunction
are the pathways that initiate activation of brain derived neurotrophic factor or BDNF.
As discussed in Stahl (2000), BDNF is critical to the normal health of brain neurons
but under stress, the transcription of the protein from the gene for BDNF is somehow
decreased leading to possible cell death of neurons. The administering of MAO
inhibitors and tricyclics then reverse this by effectively stimulating the production
of BDNF.
Finally, we know that serotonin and a peptide called substance P of a class of
peptides known as neurokinins are co-transmitters. There is some evidence that if
an antagonist to substance P is administered, it functions as an antidepressant. The
amygdala has both monoamine and substance P receptors and hence modulation of
the monoamine pre-synaptic terminal by substance P is a real possibility.
Our simple pharmacological model will allow us to model the critical portions of
these depression hypotheses in our abstract cells which can then be linked together
into larger computational ensembles.
The full cognitive model is built as an interacting collection of the intelligent
software agents encompassing auditory, visual, frontal, limbic cortex agents, sero-
tonin and noradrenergic pathway agents, environmental input agents for music,
painting and generic compositional design agents and a emotional amygdala agent.
The connections between the agents can be modified by pharmacological agents
at the critical time scales of 1−10 ms (immediate), 10−1000 ms (fast onset) and
13.5 Depression Models 245

10,000−10,000,000 ms (slow onset) via alteration of abstract cell feature vectors,


abstract fiber outputs, and abstract ensemble outputs via Hebbian interactions and
phase locking.

13.6 Integration into a Virtual World

The final step is to integrate the model into a three dimensional (3D) virtual world.
This is done by using the motor and thalamus outputs of our model to influence the
movement and texture mappings of a 3D character. Characters or avatars are gener-
ated in such worlds using computer code which constructs their body as a 3D mesh of
polygons given a 3D position. Then 2D matrices of a texture are applied to portions
of the 3D mesh by warping the texture to fit the 3D geometry of the character. This
is called the texture mapping phase and it is only then that the character takes on
a realistic appearance. In our model, the fMRI and skin conductance scalars output
from the thalamus serve to locate a position in 2D emotional space as determined
by psychophysiological experiments (see Chap. 10) which may be a more complex
emotional state than that of sadness, happiness or anger. We can assign different
texture mappings to different emotional vectors and thereby influence the physical
appearance of the avatar based on emotional input. In addition, we can assign dif-
ferent categories of motor movements to emotional vectors in order to have distinct
motor patterns that are associated with emotional states. There is much work to be
done to work out all the details of these mappings, of course. Some of this detail
is shown in Fig. 13.10. Finally, we note that this simple model will be used as the
foundation upon which we will assemble our models of cognitive dysfunction. The
diagram of Fig. 13.3 clearly shows that there are neurotransmitter inputs into the lim-
bic system and cortex which can influence the output that reaches the avatar. Indeed,
they influence the outputs we call meta level in music and painting compositional
design. There are established connections between neurotransmitter levels in cortex
associated with cognitive dysfunctions. Also, there is a body of literature on the
musical and painting outputs of cognitively disturbed patients which can therefore
be associated with the neurotransmitter levels. This enables us to develop another
set of training data that is similar in spirit to what we have already done with the
musical and painting data. Once our models are constrained by this new data, we
have models that generate outputs of music and painting in a variety of emotional
modalities. The alterations in neurotransmitter levels can be obtained by specific
lesions in our software models, thereby allowing us to do controlled lesion studies
which we believe will be of therapeutic use in the study and treatment of cognitive
dysfunction. The resulting cognitive models are then attached to the software that
generates the attributes of a character that moves autonomously in a 3D virtual world
Our high level graphical interfaces thus realize our models as 3D artificial avatars
in virtual worlds and hence can interact with both other avatars and the user of the
software. We will use these tools to develop models of cognitive dysfunction such
as depression for possible use in diagnostic/therapy efforts in collaboration with
246

Fig. 13.10 Simplified connections to the avatar in a 3D virtual world


13 Modeling Compositional Design
13.6 Integration into a Virtual World 247

cognitive scientists. We can envision that in the future, we can develop interfaces
allowing signals obtained from multielectrode electrophysiological probes to drive
the avatars as well.

13.7 Lesion Studies

Using our proposed tools, we can thus build autonomous art, music and general
design agents that are coupled to avatars that live in appropriate 3D worlds that
are not pastiche based but instead are based on emerging structure. This approach
thus has significant broad scientific impact. In addition, the utilization of music and
design professionals as our first round of clinicians to help shape the data sets used to
shape our cognitive models is innovative and will provide an infrastructure for future
involvement of such personnel in the even more important planned lesion studies.
If our software model of human cognition is capable of producing valid emotional
responses to ordinary stimuli from music and art, then it will provide a powerful tool
to study cognitive dysfunction. If such a model could be built, it would be possible to
introduce errors in many ways into the architecture. The output from the model could
then be assessed and we would in principle have an experimentally viable mechanism
for lesion studies of human cognition which is repeatable and fine grained. A few
such experimental protocols could be
• Alter how messages are sent or received between abstract neurons by changes in
the pharmacological inputs they receive (enhancement or blocking of neurotrans-
mitters via drugs at some stage in the agonist spectrum),
• Alter the physical connection strategies between the fibers in the ensembles to
model physiological wiring issues between cognitive submodules,
• Alter how messages are sent or received between ensembles to model high level
effects of neurotransmitter or other signaling molecule problems,
• Since each software agent could itself be a collection of interacting finer-grained
agents, we could introduce intracellular or extracellular signaling errors of many
types inside a given agent to study communication breakdown issues.
We believe that the capability for lesion studies that are software based could be
of great value. If we can build reasonable software models of various aspects of
human cognition, then we can also alter those models in ways dictated to us by
experimental evidence and can provide insight into aspects of cognitive dysfunc-
tion. For example, we can couple our models to mid level data (functional MRI and
gross anatomical structure) and low level data (neurochemical and neurotransmitter
changes and microstructure deviations). The resulting models will be therefore faith-
ful to data encompassing multiple scales and will be an important first step toward
abstract models that are useful in artificial (and hence controllable) lesion studies of
human cognition. Further, by altering the cognitive model in a particular lesion study,
the functioning autonomous music and painting composer can generate musical
248 13 Modeling Compositional Design

fragments and drawings/paintings that can be compared to the existing literature on


the artistic output of dysfunctional individuals for further validation (Leo 1983).
The next step is then to tie the development of our avatars to information obtained
from mouse, eye and body movements of cognitively impaired individuals as they
play a video style game based on the 3D virtual world previously developed. This
enables us to build a real-time model of the emotions and other cognitive functions
of a patient from their own interactions. We believe this could be of great value as a
diagnostic and therapeutic tool, although we have a lot of work to do to get to that
point.
There is a large and growing gap between the scientists who obtain laboratory
data and the scientists who are attempting to use the data to obtain models of higher
order processes. We believe that communication between these disparate research
endeavors can be enhanced by providing a tool which makes it easier to integrate
low level detailed biological knowledge into high level meta modeling efforts. In
many conversations with either side of this divide, there is a fundamental lack of
appreciation for the other’s work. Hence, a bridge must be built which enables each
side to more fully appreciate the important results of the other. We believe our
proposed models in the way we suggest is a useful step in this direction.

13.8 The Complete Cognitive Model

The model we have generated so far creates valid emotionally labeled musical com-
positions and paintings. Some design principles about this construction process can
now be gleaned from this work for the generation of musical and painting composi-
tional designs. In general, our model takes specialized sensory input and generates
a high level output as shown in Fig. 13.1. We can design algorithms to train our cor-
tical tissue models using laminar cortical processing via on-surround excitation/off-
surround inhibition and Hebbian based connection strengthening. Our discussion so
far shows us that the full computational model can be described by the diagram of
Fig. 13.4. As discussed for emotional data in the context of music, given a random
starting note say from column one of a Würfelspiel music matrix of given emotional
modality, our model will generate an entire valid musical composition. Note that this
output should actually be interpreted as two separate pieces of information: one, as a
musical composition and two, as an emotional state. Our data thus provides training
for the correct output of a model of emotions as well as a total of 128 happy, sad
and angry input/output training samples. We can thus generate valid compositional
designs that have a specific emotional tag for both the auditory, visual and associative
cortex pathways.
We will start with an emotional model which outputs two parameters (loosely
based on the psychophysiological data experiments). The first is actually a skin con-
ductance parameter and the second is a complicated computed value that arises from
certain fMRI measurements. The interesting thing about these values is that in exper-
iments with human subjects, when people saw pictures with emotional contents such
13.8 The Complete Cognitive Model 249

as “sad”, “angry” and so forth, the two parameters mentioned above determined a x-y
coordinate system in which different emotional attributes were placed experimentally
in very different parts of this plane. For example, “sad” images might go to quadrant
2 and “angry” images might be mapped to quadrant 3. This is an over simplification
of course, but the idea that images of different emotional attributes would be separate
in the plane is powerful. Our 128 examples of each emotional attribute thus give us
128 data points which should all be mapped to the same decision region of this two
dimensional plane. Thus, we have data that unambiguously gives us a desired two
dimensional output for our emotional model. At this point, we have a model that
given an abstract auditory and visual input stream from the data will generate given
emotional attributes. We can then turn the system around if you like, by noting that
each data set corresponds to a certain decision region in “emotional space”. Fur-
ther, recognize that we have a coupled model of sensory processing and emotional
computation. We have an auditory and visual agent which given input from a known
emotion decision region, generates a musical and artistic stream of a given emotional
attribute. We model this as a three software agent construction: auditory, visual and
emotional. Each of these agents accepts inputs from the others. We have enough data
to develop a first pass at a model which given a two dimensional emotional input
and a visual and auditory random start, will generate an emotional tagged auditory
and visual stream. This model is shown in Fig. 13.5 for music and painting. We can
then validate this model easily by just letting anyone listen or look at our output and
tell us if it is good. Hence, we are validating the whole model instead of just the
equivalent of a local patch of hippocampal tissue in a slice. Indeed, it should also be
possible to include validation in the learning algorithm.

13.9 Virtual World Constructions

A cognitive model which is implementable within an environment of distributed


computation would be able to give three dimensional characters in a virtual world
cognitive/emotional states so that more realistic “human” interactions are obtainable
in complex social, political and military simulations. Such a cognitive model, of
course, entails reasonable abstractions of biological detail. We believe that the single
most important item in the development of such a model is the training data and so
we explain as carefully as possible how we organize our training data into easily
generated Würfelspiel matrices for the auditory, visual and temporal cortex portions
of the cognitive model. The trained model is then capable of outputs which can be
interpreted as emotional states which can be tied to various attributes of a character
so as to enable believable response.
Consider again, the abstraction presented in Fig. 5.3. Musical data has provided
the kind of associated output that might come from area 37 of the temporal cortex. The
low level inputs that start the creation of a music phrase correspond to the auditory
sensory inputs into area 41 of the parietal cortex which are then processed through
areas 5, 7 and 42 before being sent to the further associative level processing in the
250 13 Modeling Compositional Design

temporal cortex. The painting data has then provided a similar kind of associated input
into area 37 from the occipital cortex. Inputs that create the paintings correspond to
the visual sensory inputs into area 17 of the occipital cortex which are then further
processed by area 18 and 19 before being sent to the temporal cortex for additional
higher level processing. The musical and painting data inputs correspond to specific
fMRI and skin conductance outputs in addition to an encoded compositional design.
Our data thus allows us to build the model shown in Fig. 13.4.
At this point we have a simple model of the thalamic outputs as shown in Fig. 5.3.
These thalamic outputs can then be integrated into a larger model which can be
part of the character behavior modules in a three dimensional (3D) virtual world. In
effect, this is done by using the motor and thalamus outputs of the model to influence
the movement and texture mappings of a 3D character. Characters or avatars are
generated in such worlds using computer code which constructs their body as a 3D
mesh of polygons given a 3D position. Then 2D matrices of a texture are applied
to portions of the 3D mesh by warping the texture to fit the 3D geometry of the
character. In our model, the fMRI and skin conductance scalars output from the
thalamus thus serve to locate a position in 2D emotional space as determined by
psychophysiological experiments which may be a more complex emotional state
than that of sadness, happiness or anger. We can assign different texture mappings
to different emotional vectors and thereby influence the physical appearance of the
avatar based on emotional input. In addition, we can assign different categories of
motor movements to emotional vectors in order to have distinct motor patterns that
are associated with emotional states. Some of this detail is shown in Fig. 13.10.
We will return to these ideas in Chap. 21 where we outline a much better method
for the training of graph based neural models. But first, we need to discuss in much
detail how we would build mappings from inputs to outputs for the complicated
neural models we have been talking about. Before we get into these details, we will
talk a bit about network models of neurons in general in Chap. 14 and start going
through some training ideas in Chap. 15.

References

J. Di Leo, Interpreting Children’s Drawings (Burnner/Mazel Publishers, Routledge, 1983)


S. Grossberg, How does the cerebral cortex work? development, learning, attention and 3D vision
by laminar circuits of visual cortex. (Boston University Technical Report CAS/CS TR-2003-005,
2003)
I. Peretz, R. Zatorre, Brain organization for music processing. Annu. Rev. Psychol. 56, 89–114
(2005). (Annual Reviews)
S. Stahl, Essential Psychopharmacology: Neuroscientific Basis and Practical Applications, 2nd
edn. (Cambridge University Press, Cambridge, 2000)
L. Stewart, K. von Kriegstein, J. Warren, T. Griffiths, Music and the brain: disorders of musical
listening. Brain 129, 2533–2553 (2006)
Chapter 14
Networks of Excitable Neurons

We know that inputs into the dendritic tree of an excitable nerve cell can be separated
into first and second messenger classes. The first messenger group consists of
Hodgkin–Huxley voltage dependent ion gates and the second messengers includes
molecules which bind to the dendrite through some sort of interface and then trigger
a series of secondary events inside the cytoplasm of the cell body. We will primarily
be interested in second messengers which are neurotransmitters. We need to have a
more detailed discussion of the full dendrite—soma—axon system so we can know
how to connect one excitable nerve cell to another and build networks. We will have a
more abstract look at second messenger systems later, but for now we will be focused
on a class of neurotransmitters that include dopamine.

14.1 The Basic Neurotransmitters

Let’s look at the class of catecholamine neurotransmitters in the nervous system.


We will focus on only a few types: DA, dopamine; NE, norepinephrine; and E,
epinephrine. These neurotransmitters can be lumped together into the category called
CAs because they all share a common core biochemical structure, the catechol group.
These neurotransmitters are very important and how they interact in a neural system
determines many behaviors. An important review of serotonin effects is given in
Roberts (2011) and in Dayan and Huys (2009) and that of catecholamines in general
can be found in Arnsten (2011). A discussion of dopamine’s effects is given in
Friston et al. (2014). We only touch on basic things here. NE and E have very similar
structure, consisting of a benzene ring and a certain type of side chain. A benzene
ring can be denoted by a cyclic structure of six carbons as is seen in Fig. 14.1. The
carbons are numbered one through six, starting from the eastern-most one on the ring
and moving counterclockwise. Replace the hydrogens on the third and fourth carbon
with a hydroxyl group as seen in Fig. 14.1. The structure is now called a catechol
group. We can also replace the hydrogen on carbon one by an ethane molecule (C2 H6 )
(see Fig. 14.2): note one hydrogen is removed from ethane so that it can attach to
© Springer Science+Business Media Singapore 2016 251
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_14
252 14 Networks of Excitable Neurons

H H OH H

C3 C2 C3 C2

H C4 C1 H OH C4 C1 H

C5 C6 C5 C6

H H H H

Benzene Ring. Catechol Molecule.

Fig. 14.1 Benzene and Catechol molecule

Fig. 14.2 Catechol molecule OH H


with ethyl chain on C1
H H
C3 C2

OH C4 C1 C C H

C5 C6
H H

H H
Catechol Molecule With
Ethyl Chain on C1 .

Fig. 14.3 Dopamine OH H

H H H
C3 C2

OH C4 C1 C C N

C5 C6
H H H

H H

Dopamine

the benzene ring. We can then add an ammonia molecule NH3 (amine group) to the
outermost carbon in the ethyl side chain, the amine losing one hydrogen in order to
bond and thus appearing on the chain as NH2 : This is the 3, 4-dihydroxy derivative
of the phenylethylamine molecule which is the neurotransmitter Dopamine (DA)
(see Fig. 14.3). Further derivatives of this then give us Norepinephrine (NE) and
Epinephrine (E). If the beta carbon on the side chain replaces one hydrogen with
a hydroxyl group by hydrolysis we obtain NE (see Fig. 14.4). If the hydrogen on
the second carbon of the ethyl side chain by a methyl ion, we get E (see Fig. 14.5).
Note that all of these molecules are monoamines—“one amine” constructions and
14.1 The Basic Neurotransmitters 253

Fig. 14.4 Norepinephrine OH H

OH H H
C3 C2

OH C4 C1 C C N

C5 C6
H H H

H H

Norepinephrine

Fig. 14.5 Epinephrine OH H

OH CH2 H
C3 C2

OH C4 C1 C C N

C5 C6
H H H

H H

Epinephrine

all are derivatives of the catechol group. Without going into detail, DA, NE and E are
constructed from the nutrient soup inside our cells via the following pathway (see
Fig. 14.6):
1. Tyrosine hydroxylase (adds a OH group) converts Tyrosine into L-DOPA. This
is called the rate limiting substance in CA synthesis because if it is not present,
the reaction will not continue. So there must be a mechanism to request from our
genome appropriate levels of Tyrosine synthesis when needed.
2. L-DOPA, formed by hydroxylation of tyrosine, is then converted into DA by
DOPA decarboxylase (take away a carboxyl group).

Add OH Remove CO3 Add OH Remove CH3

Tyrosine DOPA
DBH PnM
Hydroxylase Decarboxylase

Tyrosine L-DOPA DA NE E

Fig. 14.6 The catecholamine synthesis pathway


254 14 Networks of Excitable Neurons

3. DA is then converted into NE by DBH, dopamine-beta-hydroxylase (add a


hydroxyl group to the beta carbon).
4. NE is then converted into E by phenylethanolamine-N-methyltransferase (PnM)
(add a methyl group to the amine group on the alpha carbon).
We can see this construction pathway diagrammatically in Fig. 14.6. All of the
enzymes above, Tyrosine hydroxylase, DOPA decarboxylase, dopamine-beta-
hydroxylase and phenylethanolamine-N-methyltransferase are regulated in complex
ways. Further, all three enzymes are critical to the proper functioning of CA bio
synthesis.

14.2 Modeling Issues

Consider the following perceptive quotation (Black 1991):


More generally, regulation of transmitter synthesis shares important commonalities in
neurons that differ functionally, anatomically, and embryologically. This point is worthy of
emphasis. Increased impulse activity stimulates transmitter synthesis in dopaminergic nigral
neurons that regulate coordination of motor function, noradrenergic locus coerulus neurons
that may play a critical role in attention and arousal, and adrenergic adrenomedullary cells
central to the stress response. Simply stated, the common biochemical and genomic organi-
zation of these diverse populations determines how environmental, epigenetic information,
through altered impulse activity, is translated into neural information. In this prototypical
example, cellular biochemical organization, not behavioral modality, is a key determi-
nant of how external stimuli are converted into neural language. In this domain, modes
of information storage are biochemically specific, not modality specific, indicating that
synaptic systems subserving entirely different behavioral and cognitive functions may
share common modes of information processing

Hence, in our search for a core or parent object whose instantiation is useful
in constructing software architectures capable of subserving learning and memory
function, the evidence presented above suggests we focus on prototypical neuro-
transmitters. It is clear a model of dendritic–axonal interaction where dendrite and
axon objects interact via some sort of intermediary agent. These agents need to look
at both dendritic and axonal specific information. Both the dendrite and the axon
should have some dependency on neurotransmitter objects whose construction and
subsequent reabsorption for further reuse should follow modulatory pathways that
mimic in principle the realities of CA transmitter synthesis. Further, there is a need
for Object Recognition Sites, as dendritic objects need receptor objects that interact
with the neurotransmitter objects. There should also be a finite number of types of
neurotransmitter objects and possibly different finite number of receptor object types.
The number and variety of these sites grow or shrink in accordance with learning
laws.
How should we model CA release, CA termination, CA identification via CA
specific receptors and synaptic plasticity? CA is released through a large variety of
14.2 Modeling Issues 255

mechanisms, each of which is modulated in varying degrees by many other agents.


Most importantly, this release needs Ca++ which is mediated by second messenger
action such hormones and intraneural cAMP in what can be a rather global way.
Hence, release of CA itself can modulate subsequent CA release. Further, there are
more global mechanisms which interact with the CA release and production cycle
via non-transmitter mechanisms. Note Black (1991, p. 34)
...angiotensin II receptors on the [presynaptic] membrane also modulate norepinephrine
release. Angiotensin is a potent vasoconstrictor, derived from circulating angiotensinogen by
the action of renin, an enzyme secreted by the kidney...the principle is startling: the kidney
can communicate with sympathetic neurons through nonsynaptic mechanisms...Circulating
hormone regulates transmitter release at the synapse. Synaptic communication, then, may be
modulated by nonsynaptic mechanisms, and distant structures may talk to receptive neurons.
Consequently aspects of communication with the nervous system are freed from hard-
wiring constraints.

Clearly, dendritic–axonal object interaction should be mediated via pathways of


both local scope (using neurotransmitter objects) and global scope, using additional
objects which could be modeled after hormones.
We also know CA termination is critical to CA function. CA substance is deac-
tivated by reabsorption into the presynaptic structure itself. Any agent that would
interfere with CA uptake through the presynaptic structure allows CA neurotransmit-
ter to remain available for use too long. This has a profound effect on the functioning
of the nervous system. A typical drug agent that does this is cocaine. It follows
that our software system should have efficient collection and recycling schemes for
the neurotransmitter objects. For example, dopamine levels are controlled by three
separate mechanisms.
• Dopamine packets are released into the synaptic cleft and bind to receptors on the
dendrite. So, the number of receptors per unit area of dendritic membrane provides
a control of dopamine concentration in the cleft.
• Dopamine is broken down by enzymes in the cleft all the time which also control
dopamine concentration.
• Dopamine is pumped back into the presynaptic bulb for reuse providing another
mechanism.
Now, CA receptors can be classed into two broad types: the α- and β-receptors.
Each can be inhibitory and/or excitatory in nature. These structures can be located
on both presynaptic and postsynaptic sites. The α class is located on presynaptic
tissue and when activated, inhibits further CA release; thereby functioning as an
autofeedback loop—these receptors have been called autoreceptors. It follows that
receptor objects should be usable in both dendritic and axonal objects.
The interaction between the presynaptic and postsynaptic neurons is always
changing. This mutability is called their synaptic plasticity. It is mediated by both
local and global pathways. In Black (1991, pp. 38–39), we see the Post Synaptic Den-
sity or PSD structure mediates this plasticity. The PSD contains many components
256 14 Networks of Excitable Neurons

that can be altered via second messenger triggers which gives rise to synaptic plas-
ticity. The appropriate software agent connecting dendritic and axonal objects is
therefore an object of type say PSD—a bridging object whose function and struc-
ture is alterable via objects of type hormone, neurotransmitters and perhaps others
through some sort of software mechanism.

14.3 Software Implementation Thoughts

We now discuss some of the issues that are involved in the implementation of a
software environment that will facilitate rapid prototyping of biologically motivated
models of cognition. Using ideas from a language such as C++, python or others,
would lead us to neural architectures as derived classes from a virtual class of neural
objects. The computational strategies that support functions such as training would
also be classes derived from a core primitive class of computational engines. We wish
to allow classes of architectures which are general enough to allow each computa-
tional element in an architecture to itself be a neural network of some type, a statistical
model and so forth. This also implies that we must develop more sophisticated mod-
els of connections between computational elements (synaptic links between classical
lumped sum neuronal models are one such type of connection). In addition, we wish
to have the capability in our modeling environment to adapt abstract principles of
neuronal processing that are more closely inspired by current neurobiological knowl-
edge. Consider this comment Black (1991, pp. xii–xiii):
Briefly, certain types of molecules in the nervous system occupy a unique functional
niche. These molecules subserve multiple functions simultaneously. It is well recognized that
biochemical transformations are the substance of cellular function and mediate cell interac-
tions in all tissues and organ systems. Particular subsets of molecules simultaneously serve
as intermediates in cellular biochemistry, function as intercellular signals, and respond in
characteristic modes to environmental stimuli. These critical molecules actually incorporate
environmental information into the cell and nervous system. Consequently these molecules
simultaneously function as biochemical intermediates and as symbols representing specific
environmental conditions....
The principle of multiple function implies that there is no clear distinction among the
processes of cellular metabolism, intercellular communication, and symbolic function in
the nervous system. Representation of information and communication are part of the func-
tioning fabric of the nervous system. This view also serves to collapse a number of other
traditional categorical distinctions. For example, the brain can no longer be regarded as the
hardware underlying the separate software of the mind. Scrutiny will indicate that these
categories are ill framed and that hardware and software are one in the nervous system.

From this first quote of Black, you can see the enormous challenge we face. We can
infer from the above that to be able to have the properties of plasticity and response
to environmental change, certain software objects must be allowed to subserve the
dual roles of communication and architecture modification. Black refers to this as
the “principle of polyfunction” (Black 1991, p. 3).
14.3 Software Implementation Thoughts 257

Shorn of all detail, the software–hardware dichotomy is artificial. As we shall see, software
and hardware are one and the same in the nervous system. To the degree that these terms
have any meaning in the nervous system, software changes the hardware upon which the
software is based. For example, experience changes the structure of neurons, changes the
signals that neurons send, changes the circuitry of the brain, and thereby changes output and
any analogue of neural software.

This is an interesting comment. How are we to design a software architecture


system which essentially is self-modifying? Architectural modification in the nervous
system involves the environmental input (or some other impulse) triggering increased
production of some important protein, enzyme etc. for the purpose of constructing
synapses, growing dendrites and so forth. However, these mechanisms are already in
place and they are activated when needed. So, the architecture is modified, but no new
mechanisms are constructed. Hence, low-level software objects that implement
these mechanisms and which are accessed via some sort of request for service might
be a reasonable way to implement this capability.
Next, note Black (1991, pp. 8–9)
Increasing evidence indicates that ongoing function, that is, communication itself, alters
the structure of the nervous system. In turn altered structure changes ongoing function, which
continues to alter structure. The essential unity of structure and function is a major theme... In
summary, a neurotransmitter molecule, which is known to convey millisecond-to-millisecond
excitatory information, also regulates circuit architecture by stimulating synaptic growth. In
this system, then, signal communication, growth, altered architecture, altered neural function,
and memory are causally interrelated; there is no easy divide between hardware and software.
The rules of function are the rules of architecture, and function governs architecture, which
governs function

It seems clear from the above that our underlying software architecture must be
event-based in some fashion. If we think of architectures consisting of loosely coupled
computational modules, each module is a kind of input–output mapping which we can
call an object of type IOMAP. Models with computational nodes linked by edges are
called connectionist models. Hence, classical connectionist modeling asserts that the
information content of the architecture is stored in the weight values that are attached
to the links and these weight values are altered due to environmental influence via
some sort of updating mechanism. It is now clear the architecture itself must be
self-modifying. Now it is fairly easy to write add and delete type functions into
connectionist schemes (just think of the links and weights implemented as doubly
linked lists), but what is clear from the above that much more is needed. We need
mutable synaptic efficacy, the ability to add synapses and alter the function of neuron
bodies themselves. Consider the dual roles played by neural processing elements
(Black 1991, pp. 13–14):
[Weak reductionism] involves identification of scientifically tractable structural elements
that process information in the nervous system....Any set of elements is relevant only insofar
as it processes information and simultaneously participates in ongoing neural function; these
dual roles require the neural context.
258 14 Networks of Excitable Neurons

What structural elements may be usefully examined? It may be helpful to outline appropriate
general characteristics... First, neural elements of interest must change with environment.
That is, environmental stimuli must, in some sense, regulate the function of these particular
units such that the units actually serve to represent conditions of the real world. The potential
units, or elements of interest, thereby function as symbols representing external or
internal reality. The symbols, then, are actual physical structures that constitute neural
language representing the real world. Second, the symbols must govern the function of
the nervous system such that the representation itself constitutes a change in neural state.
Consequently symbols do not serve as indifferent repositories of information but govern
ongoing function of the nervous system. Symbols in the nervous system simultaneously
dictate the rules of operation of the system. In other words, the symbols are central to the
architecture of the system; architecture confers the properties that determine behavior of the
system. The syntax of symbol operation is the syntax of neural function.

We now have a potential blueprint for our software architectural design: we need
software objects that
1. Change with environmental conditions.
2. Their state is either a direct or indirect representation of external environmental
conditions.
3. Their change in state corresponds to a change in the underlying architecture.
Information processing also involves a combinatorial strategies Ira Black, (Black
1991, p. 17):
Two related strategies are employed by the nervous system in the manipulation of the
transmitter molecular symbols. First, individual neurons use multiple transmitter signal types
at any given time. Second, each transmitter type may respond independently of others to
environmental stimuli.... The neuron appears to use a combinatorial strategy, a simple and
elegant process used repeatedly in nature in a variety of guises. A series of distinct elements,
of relatively restricted number, are used in a wide variety of combinations and permutations.
For example, if a given neuron uses four different transmitters, and each can exist in only three
different concentrations (based on environmental influences) independently, the neuron can
exist in 34 discrete transmitter states....However, this example appears to vastly underestimate
combinatorial potential of the neuron in reality.

Each IOMAP module must therefore be capable of a certain number of active


states. This capability cannot be achieved with a simple scalar parameter for the
information processing of a neuron. How are we to implement such a combinato-
rial possibility? How should we imbue our software objects with the equivalent of
activation via a variety of neurotransmitter pathways of varying temporal signatures?
To implement an architecture where computational modules communicate—an
agent based architecture, we will turn to functional programming languages such as
Erlang, Haskell and so forth which do not use shared memory. These languages can
easily use thousands of cores and so they are very promising. Also, Erlang allows
us to write code that can continue to run even if some of its processes are lost which
will allow us to implement lesion studies very nicely.
14.4 How Would We Code Synapse Interaction? 259

14.4 How Would We Code Synapse Interaction?

Let’s put together all of the insight we have gathered from our discussions. We know
that
1. The principle of multiple function implies that there is no clear distinction among
the processes of cellular metabolism, intercellular communication and symbolic
function in the nervous system. Hence our software infrastructure should pos-
sess the same blurring of responsibility.
2. As mentioned before, software and hardware are the same in the nervous system.
This implies that the basic building blocks of our software system should be able
to alter the software architecture itself. We could implement part of this capability
using the notion of dynamically bound computational objects and strategies. using
the C++ language. We could also do prototyping in Octave, python and others. If
we use the compiled C++ language, we will be forced to deal with the inadequacies
of the C++ language’s implementation of objects whose type is bound at run-time.
As we mentioned above, we will not use C++ at this time and instead focus on
Erlang.
3. Another basic principle from Black (1991) in the context of ongoing function
is that the very fact of communication, alters nervous system structure. This
implies that the structure of our software architecture be mutable in response to
the external input environment.
The evidence for these organizing principles can be summarized as follows:
1. We know that dendritic-axon coincident electrical activation strengthens synap-
tic efficacy through what are called Hebbian mechanisms. There is substantial
evidence of this: in the rat hippocampus, long term potentiation (LTP) leads to
structural modification of the synapse via the glutamate neurotransmitter. Clearly,
neurotransmitters which are known to convey millisecond to millisecond
excitatory information also regulate circuit architecture via synaptic struc-
tural change.
2. Macroscopic behavioral change is mirrored by distinct structural change at the
synaptic level. These points have been verified experimentally by the many excel-
lent investigations into the nervous system of the the sea snail Aplysia Californica.
In this animal, long term habituation and long term sensitization behaviors are
associated with definite structural changes in the number of synaptic vesicles, the
number of synaptic sites and so forth. In addition, these microscopic changes in
structure have the same temporal signature that the behavioral changes exhibit;
i.e. these changes last as long or as short as the corresponding behavior.
Neurotransmitters give the nervous system a capability to effect change on an extra-
ordinary range of time scales; consider the following simplified version of neuro-
transmitter interaction. We abstract out of the wealth of low-level biological detail
an abstract version of neurotransmitter pathways: start with an initial change in the
external environment E0 coming into a neuronal ensemble. This elicits a corre-
sponding change in impulse I0 from the ensemble which in turn triggers a change
260 14 Networks of Excitable Neurons

in neurotransmitter level, NT0 . This change in neurotransmitter is further mod-


ulated by the breakdown rate and reabsorption rate for the neurotransmitter. For
simplicity, assume this is modeled by the constant factor R. Thus the net change in
neurotransmitter level is given by NT0 − R. Consider the time series correspond-
ing to an initial environmental change of E0 which is never repeated. We let time
be modeled in discrete ticks labeled by {t0 , t1 , t2 , . . .} and let the subscript of each
variable indicate its value at that time tick. We obtain the series:

E0 → I0 → (NT0 − R)


→ I1 → (NT1 − R)
→ I2 → (NT2 − R)
..
.
→ Ip → (NTp − R)

where the process terminates at step p because NTp − R ≈ 0. We see that the
interplay between creation of neurotransmitter and its reabsorption rate allows for
the effect of an environmental signal to have a temporal signature that lasts beyond
one time tick.
Our model must then clearly allow for nonlocal interaction and long term temporal
response to a given signal. From the above discussion, we also see that the basic
response of change in environment implies change in neurotransmitter level (E →
NT ) does not depend on what the signal type is or which neurotransmitter pathway
is activated. Evidence for this includes the following neurotransmitter-signal pairs
which all modulate response to a change in signal via the change in transmitter level:
1. Dopamine releasing neurons regulating motor function.
2. Norepinephrine releasing neurons regulating attention and arousal.
3. Adrenaline releasing neurons regulating stress response.
Here, we see clearly that cellular biochemical organization, not the type of environ-
mental signal or behavior, is the key to how external stimuli are converted into neural
language. We seek to construct a core group of software objects that have similar
function. Their organization and their methods of interaction will determine how
external stimuli are converted into our neural object language.

14.4.1 The Catecholamine Abstraction

We begin by drawing from the wealth of information above, a simplified version


which will serve as the building block for our abstraction of these ideas into a func-
tional software architecture. Consider Fig. 14.7. In it, we try to suggest the units from
which we need to build the abstract neuronal object. Each dendrite object contains a
number of receptor objects, while an axon object contains neurotransmitter objects.
14.4 How Would We Code Synapse Interaction? 261

Axon Axon

Neurotransmitter Neurotransmitter

Post
Dendrite Receptor Synaptic Soma Axon
Density

Receptor

Dendrite

Fig. 14.7 Abstract neuronal process

A dendrite–axon object pair interact via the PSD object which plays a role that is
similar to its biological function; however, we will also see that we will be able to
build extremely rich mathematical numerical processing objects as well using this
paradigm.
The output of the PSD object is sent to the computational body of the neuronal
object, the soma object. The details of how the soma will process the inputs it receives
will be discussed later. For now, we just note that some sort of information process-
ing takes place within this object. The output of the soma is collected into an axon
object containing the previously mentioned neurotransmitter objects. In this simpli-
fied illustration shown in Fig. 14.7, we do not try to indicate that there could be many
different types of neurotransmitters. We indicate the interaction pathways in Fig. 14.7
on the dendritic and axonal side of the soma. We provide further detail of the PSD
in the close up view given by Fig. 14.8. Here, we assume that the dendrite object
contains five types of neurotransmitters and the axon uses the associated receptor
objects for these neurotransmitters. The result of the dendritic–axonal interaction
is then computed by the PSD object via some as yet unspecified means. Now, how
should we handle the intricacies of the neurotransmitter–receptor interactions? We
do not want to model all of the relevant details. For our purposes, we will concentrate
on a few salient characteristics:
1. The probability of neurotransmitter efficacy will be denoted by the scalar para-
meter p. The variable p models neurotransmitter efficacy in a lumped parameter
manner. The amount of transmitter produced via the pathways that access the
genome, the probability of interaction with receptors and so forth are combined
into one number. More neurotransmitter is modeled by increasing p; increased
262 14 Networks of Excitable Neurons

Fig. 14.8 Dendrite–axonal interaction pathway

numbers of receptors or increased efficacy of receptors can also be handled by


increasing p. We can be even more specific. Let N denote the amount of neu-
rotransmitter units available (a unit is essentially a synaptic vesicle) and R the
number of receptor sites in the PSD. Let ρ be the probability that a neurotrans-
mitter finds a receptor. The amount of transmitter available for binding is then
ρ N ; further, let ξ(R) be the probability that a receptor site is able to bind with
the transmitter. This is of course dependent on the number of receptor sites. Thus
the activity level p of the neurotransmitter is given by

p = ρ N ξ(R)

This gives a transmitter efficacy model that is a function of amount of transmit-


ter N , number of receptors R and probabilities of finding and binding. Hence,
our efficacy model p can be written p(ρ, ξ, N , R) to explicitly indicate these
dependencies.
2. Each neurotransmitter will have its own reabsorption rate denoted by q.
3. Each neurotransmitter has its own intrinsic time interval of action: we will not
model this explicitly at this time. Instead, we can mimic this effect by controlling
the (p, q) interactions for a given neurotransmitter.
4. A neurotransmitter has an associated locality that sets the scope of its interaction
with dendrites and so forth. For example, a neurotransmitter might only have
a local interaction via the PSD structure between one axon and one dendrite.
On the other hand, the neurotransmitter might permeate the intracellular fluid
surrounding the dendrite and axonal trees of our artificial neural system. Hence,
in software, we might want a given neurotransmitter to effect PSD structures on
other dendrites. We might also want other axons contributing this neurotransmitter
to contribute to the net dendrite–PSD–axon interaction. These effects will be
modeled with a locality object, say LOCAL. The equations that we will define
for various sorts of dendritic–axonal interactions (Eqs. 14.1–14.4) are all written
at this point in terms of sets of active neuronal indices for our artificial neural
system.
14.4 How Would We Code Synapse Interaction? 263

14.4.2 PSD Computation

Let’s denote the PSD computation by the symbol •. We need to model how to obtain
the value of a prototypical interaction, I. The simplest case is that of a One Axon–One
Dendrite Interaction. We will let the term Aij denote a collection of things organized
using standard vector notion. We have
⎧ i

⎪ A [0] = v value attached to the axon.
⎨ j
Aj = Aij [1] = pσ the efficacy for an associated neurotransmitter σ
i


⎩ Ai [2] = q the reabsorption rate for an associated neurotransmitter σ
j σ

We use a similar notation of the dendrite, Dji . We allow for more fields in the dendrite
than the value field, so we show this as a dotted entry.

Dji [0] = v value attached to the dendrite.
Dji = ..
.

Thus, the value of the interaction is given by


 
j
I = Aij [1] − Aij [2] Aij [0] • D [0] (14.1)

where Aij is the jth axon connection from neuron i. In this case, it connects to the
j
th dendrite of neuron j, D . Another situation occurs when the total efficacy and
reabsorption rate for a given axon are determined by efficacies and reabsorption
rates from a pool of surrounding axons also. However, there is still just one dendrite
involved. We can call this a One Axon (Local Domain)–One Dendrite interaction
whose interaction value is now given by
 
j
I= Aj [1] − Aj [2] Aij [0] • D [0]
k k
(14.2)
k∈Li (σ)

where the new symbol Li (σ) denotes the axonal locality of the neurotransmitter σ
in the ith neuron’s axon. It is easy to extend to a situation where in addition to the
pool of local axons that contribute to the efficacy and reabsorption rate of the axon,
we now add the possibility of local effects on the dendritic values. This is the One
Axon (Local Domain)–K Dendrites case and is the most complicated situation. The
interaction value is now
 
I= Aj [1] − Aj [2] Aij [0] • Dm [0]
k k
(14.3)
m∈Mj (σ) k∈Li (σ)
264 14 Networks of Excitable Neurons

where the new symbol Mj (σ) denotes the dendritic locality of the neurotransmit-
ter σ in the ith neuron’s axon. Even this simple model thus allows for long term
temporal effects of a given neurotransmitter by using appropriate values of p and q.
Indeed, this model exhibits the capability for imbuing our architectures with great
plasticity. For example, we can model the effects of T + 1 several different transmit-
ters {σ0 , σ1 , . . . , σT } at one PSD junction by using the notation (Akj )t and (Dm )t to
indicate which neurotransmitter we are looking at and then summing the interactions
as follows:
T  
I= (Akj )t [1] − (Akj )t [2] (Aij )t [0] • (Dm )t [0] (14.4)
t=0 m∈Mj (σt ) k∈Li (σt )

Let’s stop and reflect on all of this. These complicated chains of calculation can be
avoided by moving to graphs of computational nodes which communicate with each
other by passing messages. The local and more global portions of these computations
are simply folded into a given node acting on its own input queue of messages using
protocols we set up for the processing. This is what we will be doing with chains of
neural objects.

14.5 Networks of Neural Objects

We will discuss what is called a chained network in Chap. 17, but it is easy enough to
jump in a bit early to see how our neural computions could be handled. We can connect
a standard feedforward architecture of neurons as a chain as shown in Fig. 14.9: Note
we can also easily add self feedback loops and general feedback pathways as shown
in Fig. 14.10. Note that in Figs. 14.9 and 14.10, the interaction between dendritic and
axonal objects has not been made explicit. The usual simplistic approximation to
this interaction transforms a summed collection of weighted inputs via a saturating
transfer function with bounded output range. The dendritic and axonal interaction is
modeled by a scalar Wij as described above. It measures the intensity of the interaction
between the input neuron i and the output neuron j as a simple scalar weight. Since
we have developed the ballstick neural processing model, we will eventually try to
do better than this.

14.5.1 Chained Architecture Details

Let’s look at the Chained Feed Forward Network (CFFN) in more detail. In this
network model, all the information is propagated forward. Hence, Fig. 14.10 is
not of this type because it has a feedback connection as well as a self loop and
so is not strictly feedforward. The CFFN is quite well-known and our first expo-
sure to it occurred in the classic paper of Werbos (1987). Consider the function
14.5 Networks of Neural Objects 265

i0 O0
W05
W04
W03 W25 W40
W00 W02
W45

n0 n1 n2 n3 n4 n5

W13 W35 W51


W01 W14

i1 A feedforward architecture draw as a chain with O1


two inputs and two outputs. The strength of the
neural interactions is labeled as Wij where i is
the index of the neuron providing input and j
is the index of the neuron accepting the input.

Fig. 14.9 A feedforward architecture shown as a chain of processing elements

i0 O0
W05
W04
W03 W25 W40
W00 W02
W45

n0 n1 n2 W22 n3 n4 n5
W42
W13 W35 W51
W01 W14

i1 O1
A feedback and feedforward architecture draw as a chain
with two inputs and two outputs. Note the self feedback
interaction of W22 and the feedback strength of W42 .

Fig. 14.10 A chained neural architecture with self feedback loops and general feedback

H : nI → nO that has a very special nonlinear structure which consists of a chain
of computational elements, generally referred to as neurons since a very simple model
of neural processing models the action potential spike as a sigmoid function which
transitions rapidly from a binary 0 (no spike) to 1 (spike). This sigmoid is called
a transfer function and since it can not exceed 1 as 1 is a horizontal asymptote, it
is called a saturating transfer function also. This model is known as a lumped sum
model of post-synaptic potential; now that we know about the ball—stick model
of neural processing, we can see the lumped sum model is indeed simplistic. Each
neuron thus processes a summed collection of weighted inputs via a saturating trans-
266 14 Networks of Excitable Neurons

Fig. 14.11 Nodal computations. a Schematic. b Processing

fer function with bounded output range (i.e. [0, 1]). The neurons whose outputs
connect to a given target or postsynaptic neuron are called presynaptic neurons.
Each presynaptic neuron has an output Y which is modified by the synaptic weight
Wpre,post connecting the presynaptic neuron to the postsynaptic neuron. This gives
a contribution Wpre,post Y to the input of the postsynaptic neuron. A typical satu-
rating transfer function model is shown in the Fig. 14.11a, b. Figure 14.11a shows
a postsynaptic neuron with four weighted inputs which are summed and fed into
the transfer function which then processes the input into a bounded scalar output.
Figure 14.11b illustrates more details of
 the processing.
 A typical transfer function

could be modeled as σ(x, o, g) = 0.5 1.0 + tanh x−o
φ(g)
with the usual transfer
function derivative given by
 
∂σ φ (g) x−o
(x, o, g) = sech2
∂x 2φ(g) φ(g)

where o denotes the offset indicated in the drawing and φ(g) is a function controlling
slope. The slope controlling parameter is usually called the gain of the transfer
function as it represents amplification of the incoming signal. Note that at the offset
φ (g)
point, ∂σ
∂x
= 2φ(g) ; hence, g effectively controls the slope of the most sensitive region
of the transfer functions domain. The function φ(g) is for convenience only; it is
awkward to allow the denominator of the transfer function model to be zero and to
change sign. A typical function to control the range of the gain parameter might be
φ(g) = gm + gM −g 2
m
(1.0 + tanh(g)) where gm and gM denote the lower and upper
saturation values of the gain parameter’s range. The chained model then consists of
a string of N neurons, labeled from 0 to N − 1. Some of these neurons can accept
external input and some have their outputs compared to external targets. We let

U = indices i where neuron i is an input


= u0 , . . . , unI −1 (14.5)
V = indices i where neuron i is an output
= {v0 , . . . , vnO −1 } (14.6)
14.5 Networks of Neural Objects 267

Fig. 14.12 Local networks. a Chained feedforward. b Local expert

We will let nI and nO denote the size of U and V respectively. The remaining neurons
in the chain which have no external role are sometimes called hidden neurons with
dimension nH . Note in a chain, it is also possible for an input neuron to be an
output neuron; hence U and V need not be disjoint sets. The chain is thus divided
by function into three possibly overlapping types of processing elements: nI input
neurons, nO output neurons and nH internal or hidden neurons. In Fig. 14.12a, we see
a prototypical chain of eleven neurons. For clarity only a few synaptic links from pre
to post neurons are shown. We see three input neurons (neurons 0, 1 and 4) and four
output neurons (neurons 3, 7, 9 and 10). Note input neuron 0 feeds its output forward
to input neuron 1 in addition to feeding forward to other postsynaptic neurons. The
set of postsynaptic neurons for neuron 0 can be denoted by the symbol F(0) which
here is the set F(0) = {1, 2, 4}. Similarly, we see F(4) = {5, 6, 8, 9}.
We will let the set of postsynaptic neurons for neuron i be denoted by F(i), the
set of forward links for neuron i. Note also that each neuron can be viewed as a
postsynaptic neuron with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by
B(i). In our example, B(0) = {} and B(4) = {0}, where in general, the backward
link sets will be much richer in connections than these simple examples indicate.
For example, the chained architecture could easily instantiate what is called a local
expert model as shown in Fig. 14.12b. The weight of the synaptic link connecting
the presynaptic neuron i to the postsynaptic neuron j is denoted by Wi→j . For a
feedforward architecture, we will have j > i, however, as you can see in Fig. 14.10,
this is not true in more general chain architectures. The input of a typical postsynaptic
neuron therefore requires summing over the backward link set of the postsynaptic
neuron in the following way:

ypost = x + Wpre→post Y pre


pre∈B(post)
268 14 Networks of Excitable Neurons

Fig. 14.13 Postsynaptic


output calculation

where the term x is the external input term which is only used if the post neuron is
an input neuron. This is illustrated in Fig. 14.13. We will use the following notation
(some of which has been previously defined) to describe the various elements of the
chained architecture.

xi External input to the ith input neuron


yi Summed input to the ith neuron
oi Offset of the ith neuron
gi Gain of the ith neuron
σi Transfer function of the ith neuron
Yi Output of the ith neuron
Ti→j Synaptic efficacy of the link between neurons i and j
F (i) Forward link set for neuron i
B(i) Backward link set for neuron i

The chain FFN then processes an arbitrary input vector x ∈ RnI via an iterative
process as shown below.
for(i = 0; i < N; i + +) {
if (i ∈ U)
yi = x i + j∈B(i) Tj→i Y j
else
yi = j∈B(i) Tj→i Y j
Y i = σ i (yi , oi , g i )
}
The output of the CFFN is therefore a vector in nO defined by H(x) = Y i | i ∈ V ;
that is,
⎡ ⎤
Y v0
⎢ .. ⎥
H(x) = ⎣ . ⎦
Y vnO −1
14.5 Networks of Neural Objects 269

and we see that H : nI → nO is a highly nonlinear function that is built out of
chains of nonlinearities. The parameters that control the value of H(x) are the link
values, the offsets and the gains for each neuron. Note computations are performed
on each nodes input queue!

14.5.1.1 The Traditional Structure of the Chain Model

The CFFN model consists of a string of N neurons. Since each neuron is a general
input/ output processing module, we can think of these neurons as type IOMAP.
The neurons are labeled from 0 to N − 1. A dendrite of any neuron can accept
external input and an axon of any neuron can have an external tap for comparison to
target information. In Fig. 14.10, we see a chain of six neurons. We see two neurons
that accept external input (neurons 0 and 1) and two neurons having external taps
(neurons 4 and 5). The set of postsynaptic neurons for a given neuron i is denoted
by the symbols F(i); the letter F denotes the feedforward links, of course. Note the
self and feedback connection in the set F(2). Also each neuron can be viewed as
a postsynaptic neuron with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by B(i).
In our example, these are the sets given below

F (0) {1, 2, 3, 4, 5} B(0) {}


F (1) {3, 4} B(1) {}
F (2) {2, 5} B(2) {0, 2, 4}
F (3) {5} B(3) {0, 1}
F (4) {5} B(4) {0, 1}
F (5) {} B(5) {0, 2, 3, 4}

where in general, the backward link sets can be much richer in connections than this
simple example indicates. This model thus uses the computational scheme
   
Y post = σ ypost , o, g = σ x + Wpre→post Y pre , o, g
pre∈B(post)

which can be rewritten


 
a post
=S x+ a pre
•d post
,p
pre∈B(post)

where a indicates the value of a given axon, d, the value of a given dendrite and S,
the soma computational engine, which depends on a vector of parameters, p. In the
simple sigmoidal transfer function used in this chain, the parameters are the gain,
offset and the minimum and maximum shaping parameters for the sigmoid; hence, the
parameter vector p would have values p[0] = o, p[1] = g, p[2] = gm and p[3] = gM .
270 14 Networks of Excitable Neurons

Fig. 14.14 A chained neural


architecture with PSD i1 i0 n0
computational sites shown

p02 p03 p04 p05

p14 p13 n1

n2 p22 p24 p42

n3

p35 p45 n4

n5 O1 O0

A chained neural architecture with self


feedbackloops and general feedback.

Finally, we let the symbol apre • d post indicate the computation performed by the PSD
between the pre and post neuronal elements. Note that the PSD • operation in the
chain model therefore corresponds to a simple multiplication of the pre-axon value
d and the post-dendrite value d.

14.5.2 Modeling Neurotransmitter Interactions

The use of neurotransmitters in our axon models and receptors in our dendrite models
requires more discussion. The original Figs. 14.9 and 14.10 did not show the PSD
computational structures, while the larger illustration Fig. 14.14 does. This gives
some insight into our new modeling process, but it will be much more illuminating
to look at careful representations of this six neuron system in the case of two receptors
on all dendrites and two neurotransmitters on all possible axons. Note that this is just
for convenience and the number of neurotransmitters and receptors does not need to
be constant in this fashion. Further, the number of neurotransmitters and receptors
at each dendrite and axon need not match. Nevertheless, the illustrations presented
14.5 Networks of Neural Objects 271

D0

+ i0

NT0 R0 R1 NT0

A00 • D01 a0 N0 a4 A04 • D35

NT1 NT1
a2

a1 NT0 NT1 a3

NT0 NT1 A02 • D13 NT0 NT1

A01 • D12 A03 • D14

Fig. 14.15 The dendritic tree details for Neuron N0 showing two neurotransmitters per pre axon
connection to the dendrite

in Fig. 14.15 give a taste of the plasticity of this approach. We will use a common
notation for these illustrations given by

D0 , D1 : The dendritic objects associated with a neuron.


I: The external input to a dendrite.
R0 , R1 : The receptor objects associated with a dendrite.
NT 0 , NT 1 : The neurotransmitter objects associated with an axon.
N0 , N1 , N2 , N3 , N4 , N5 : The six neurons in our example.
A0 , A1 , . . .: The axons associated with a neuron.
p
Aij • Dq : The PSD object handling interaction between the jth axon of neuron i and
the qth dendrite of neuron p.
EO : The external tap on an axon.

With all this said, the extended pictures of neuron 0 gives a very detailed look
at the network previously shown as Fig. 14.14. We could also specify axonal and
dendritic locality sets (although we do not illustrate such things in these pictures for
reason of excessive clutter), but we don’t really have to as long as we think of all
of this calculation as taking place in a given nodes input queue asynchronously. But
272 14 Networks of Excitable Neurons

we could setup this sort of careful linking as follows. We might specify the axonal
locality set for the axon 2 of neuron 1 for neurotransmitter t to consist of two axons:
axon 1 of neuron 1 and axon 2 of neuron 1 (itself). This information would then be
stored in an appropriate linking description, but for convenience of exposition, we
list it here in set form. The name ALt (i) denotes the axon link set for neuron i for
neurotransmitter t. So, this example would give AL(A12 )t = {A11 , A12 }. In a similar
way, we could define the dendritic locality set, DL for dendrite 1 of neuron 2 for
p
receptor r in set form (using the standard Aij and Dq notation): DL(D11 )r = {D02 , D12 }.
Look carefully at neuron 0. Figure 14.15 shows its full structure. It is the first neu-
ron in the chain and the computational capability of the neuron and its dendritic and
axonal tree is highly nonlinear due to the dependence on the two neurotransmitters
and two receptors present at each of the five PSD calculations. There are also time
delay possibilities that are controlled by gate opening and closings for the receptors.
The other neurons would have similar diagrams and we will not show them here.

14.6 Neuron Calculations

The discussion above shows us how to define various types of neurons that all have
a similar structure. A neuron type is typically called a class. We will go back to our
ball stick model to organize our thoughts. A neuron class must compute the axon
hillock voltage, of course. This computation explicitly depends on
• L, the length of the cable,
• λc , the electrotonic length,
• The value of ρ = GGDS , the ratio of the dendritic conductance to soma conductance,
• The solution to the eigenvalue equation

tanh(L)
tan(αL) = − (αL),
ρL

• Q, the number of eigenfunctions to use in our expansions,


• The vectors cj we obtain for voltage impulses at location j on the dendrite. There
are versions of this data vector for each choice of j; hence, there are L data vectors.
• The M matrix which is only computed once.
• The solution to the matrix system MA = cj for each j.
Thus, we can define a neuron class for each possible choice of this parameter set. The
solutions of the matrix equation and the eigenvalue problem need only be computed
once for each neuron class. Now for a given neuron of the a class, consider the model
of Fig. 14.16 in which the details of the soma cable are hidden within the circle that
is drawn to represent the cell body. Each neuron’s soma is actually a Rall cable of
length LS with electrotonic distance LSE and has an attached dendrite with length LD
with electrotonic distance LDE . The axon hillock is identified with the z = 0 position
14.6 Neuron Calculations 273

Fig. 14.16 A network model

on the soma cable. In the figure, the dendritic and soma cables of the presynaptic
neurons are not shown in detail and, for convenience, the cable length is shown as
LD = 4. We can now write down in organized form how computations will take place
in this network.
• At τ = 0, calculate the voltage, v, at the axon hillock via

QD LD NjD,0
j
vD (0) = V0,D,i Âjn cos[αn (LD − j)]
n=0 j=0 i=0

QS LS NkS,0
vS (0) = k
V0,S,i B̂nk cos[βn (LS − k)]
n=0 k=0 i=0
v(0) = vD (0) + vS (0)

where NjD,0 and NkS,0 denote the number of pulses arriving at node j and node k
of the dendrite and soma, respectively, at time τ = 0. We note the scalars such
as V0,D,i are actually the attenuated values V0,D,i e−(1+alphan ) where is a small
j j 2

amount of time—say .05 time constants. The voltage at the axon hillock is then
used to initialize the Hodgkin–Huxley action potential response as we discussed in
Peterson (2015). If an axon potential is generated, it can be sent back as a negative
impulse to the dendrites to model the refractory period of the neuron.
• Advance the time clock one unit.
• At τ = 1, at the axon hillock, the voltages applied at τ = 0 have now decayed to
274 14 Networks of Excitable Neurons

QD LD NjD,0

V0,D,i Âjn cos[αn (LD − j)] e−(1+αn )


j 2
vD (0, 1) =
n=0 j=0 i=0

QS LS NkS,0
B̂nk cos[βn (LE − k)] e−(1+βn )
2
vS (0, 1) = k
V0,S,i
n=0 k=0 i=0

The new pulses that have arrived at τ = 1 are given by

QD LD QjD,1
j
vD (1) = V1,D,i Âjn cos[αn (LD − j)]
n=0 j=0 i=0

QS LS QkS,1
vS (1) = k
V1,S,i B̂nk cos[βn (LE − k)]
n=0 k=0 i=0
z(0, 1) = vD (1) + vS (1)

and the total synaptic potential arriving at the axon hillock at time τ = 1 is thus

v(1) = (vD (1) + vD (0, 1)) + (vS (1) + vS (0, 1)) (14.7)

Again, the voltage at the axon hillock is then used to initialize the Hodgkin–Huxley
action potential response.
• At time τ = 2, we would find

QD LD QjD,0

V0,D,i Âjn cos[αn (LD − j] e−2(1+αn )


j 2
vD (0, 2) =
n=0 j=0 i=0

QS LS QkS,0
B̂nk cos[βn (LE − k] e−2(1+βn )
2
vS (0, 2) = k
V0,S,i
n=0 k=0 i=0

QD LD QjD,1

V1,D,i Âjn cos[αn (LD − j] e−(1+αn )


j 2
vD (1, 1) =
n=0 j=0 i=0

QS LS QkS,1
B̂nk cos[βn (LE − k] e−(1+βn )
2
vS (1, 1) = k
V1,S,i
n=0 k=0 i=0

The axon hillock voltage at τ = 2 would then be

v(2) = (vD (2) + vD (1, 1) + vD (0, 2)) + (vS (2) + vS (1, 1) + vS (0, 2)) (14.8)
14.6 Neuron Calculations 275

where the notation vD (0, i) means the input voltage at time 0 after i time units of
attenuation. Hence, vS (1, 1) means the voltage that came in at time 1 1 time unit
later.
• Continuing in this fashion, at time τ = T , the total synaptic potential seen at the
axon hillock is
⎛ ⎞ ⎛ ⎞
T T
v(T ) = ⎝vD (T ) + vD (T − j, j)⎠ + ⎝vS (T ) + vS (T − j, j)⎠ (14.9)
j=0 j=0

and an axon potential is generated with possible refractory feedback as before.


The voltage at the axon hillock is then used to initialize the Hodgkin–Huxley action
potential response as we discussed in Peterson (2015).
It should be clear you that this is a very complex process even for a single neuron.
It is quite complicated to understand how to model a network of such computational
modules efficiently. We have spent a lot of time in this text detailing how to test the
models using snippets of MatLab code and it has been very instructive. However, the
interpretative nature of the MatLab integrated development environment begins to
hamper us if we want to build larger interactions. From the discussions in this section,
we can also see that differential equation models that are based on continuous time
rather than discrete time have difficulties too. In order to build even simple software
architecture implementations for networks of neurons, it is therefore clear we must
develop an abstract framework for the implementation of neural processing models.
To do this, we have developed abstractions of the action potential we call biological
feature vectors or BFVs which have been discussed already. For example, at each
time tick in the computational chain above, we can compute the BFV that associated
with the change in membrane voltage. We can then trigger an action potential if the
axon hillock voltage exceeds a threshold voltage . Thus if V (T ) > , the the
output of the cell, Y , is a BFV, ξ. Thus,

0 if V (T ) < ,
Y = , (14.10)
ξ otherwise

For these calculations, we need to decide on an active time period. For example, this
active period here could be the five time ticks t0 to t4 . Hence, any incoming signals
into the cell prior to time t4 are ignored. Once, the time period is over, the cell can
possibly generate a new BFV.
Now step back again. All of this intense timing with its need to pay attention to
locality sets and so forth, can be removed by simply focusing on the chain architecture
and the backward and forward link sets of the graph of nodes. It is easy to implement
in a truly asynchronous way and since computation is on whatever is in a nodes input
queue, much of this messy notation is unnecessary! We will explore this approach
in later chapters.
276 14 Networks of Excitable Neurons

References

A. Arnsten, Catecholamine influences on dorsolateral prefrontal cortical networks. Biol. Psychiatry


69, e89–e99 (2011)
I. Black, Information in the Brain: A Molecular Perspective (A Bradford Book, MIT Press, Cam-
bridge, 1991)
P. Dayan, Q. Huys, Serotonin in affective control. Annu. Rev. Neurosci. 32, 95–126 (2009). (Annual
Reviews)
K. Friston, P. Schwartenbeck, T. FitzGerald, M. Moutoussis, T. Behrens, R. Dolan, The anatomy of
choice: dopamine and decision-making. Philos. Trans. R. Soc. B 369(20130481), 1–12 (2014)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models. Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015 in press)
A. Roberts, The importance of serotonin for orbitofrontal function. Biol. Psychiatry 69, 1185–1191
(2011)
P. Werbos, Learning how the world works, in Proceedings of the 1987 IEEE International Confer-
ence on Systems, Man and Cybernetics, vol. 1. IEEE (1987), pp. 302–310
Chapter 15
Training the Model

As we have seen in the previous chapters, it is going to be difficult to model the


information processing that we feel is present in neural computation. To get started,
let’s take what we have sketched about our networks being graphs of computational
nodes and see if we can figure out a way to make such a graph map a given set of
inputs into a desired set of objects. We are going to use what are called Directed
Graphs as the edges between nodes are directional; we do not assume information
flows both ways between two nodes. A specified edge will tells us information goes
from the input or pre node to the output or post node as we have shown in our earlier
discussions on chained objects. We will remain focused on how we can do this within
MatLab although it will quickly become clear we eventually want to move to other
computer languages. So to illustrate how we can train or imprint clusters of com-
putational objects, let’s consider a typical On-Center, Off-Surround cortical circuit
and the associated Folded Feedback Pathway which connects two stacked cortical
blocks. These circuits have been taken from Raizada and Grossberg (2003). We show
their respective directed graph (DG) representations in Fig. 5.22a, b. Associated with
any such DG structure, there is a notion of flow given by the Laplacian of the graph.
It is easiest to show how this is done for the OCOS and FFP DG. It will then be
evident how to extend to the DG’s associated with other computational modules in
the simple brain model. Although not shown in the implementation source code,
there are methods for the Laplacian calculations and the resulting flows as well.

15.1 The OCOS DAG

Let’s redo the OCOS graph as we did in Chap. 18. In Fig. 5.22a, there are 8 neural
nodes and 9 edges. We relabel the node numbers seen in the figure to match our
usual feedforward orientation. The OCOS DG is a graph G which is made up of the
vertices V and edges E given by

© Springer Science+Business Media Singapore 2016 277


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_15
278 15 Training the Model

V = {N1 , N2 , N3 , N4 , N5 , N6 , N7 , N8 }
E = {E1 , E2 , E3 , E4 , E5 , E6 , E7 , E8 , E9 }

where you can see the nodes (we identify N1 in the graph model with N8 in the figure
and so forth). We will use the identifications
⎡ ⎤
E1 = edge from node N1 to N2 E2 = edge from node N2 to N3
⎢E3 = edge from node N2 to N4 E4 = edge from node N2 to N5 ⎥
⎢ ⎥
⎢E5 = edge from node N3 to N6 E6 = edge from node N4 to N7 ⎥
⎢ ⎥
⎣E7 = edge from node N5 to N8 E8 = edge from node N2 to N7 ⎦
E9 = edge from node N1 to N7

We then denote the OCOS DG by G(V, E). Recall, its incident matrix is then the
matrix K whose Kne entry is defined to be +1 if there is an edge going out of node
n; −1 if there is an edge going into node n and 0 otherwise. For the OCOS DG, K is
the 8 × 9 matrix given by
⎡ ⎤
1 0 0 0 0 0 0 0 1
⎢−1 1 1 1 0 0 0 1 0⎥
⎢ ⎥
⎢0 −1 0 0 1 0 0 0 0⎥
⎢ ⎥
⎢0 0 −1 0 0 1 0 0 0⎥
K=⎢
⎢0

⎢ 0 0 −1 0 0 1 0 0⎥⎥
⎢0 0 0 0 −1 0 0 0 0⎥
⎢ ⎥
⎣0 0 0 0 0 −1 0 −1 −1⎦
0 0 0 0 0 0 −1 0 0

Now assume that there is a vector function f which assigns a scalar value to each
node. Then, the gradient of f on the DG is defined to be ∇ f = K T f given by Eq. 15.1.
⎡ ⎤ ⎡ ⎤
f (N1) − f (N2) f 1 − f2
⎢f (N2) − f (N3)⎥ ⎢ f2 − f3 ⎥
⎢ ⎥ ⎢ ⎥
⎢f (N2) − f (N4)⎥ ⎢f 2 − f 4 ⎥
⎢ ⎥ ⎢ ⎥
⎢f (N2) − f (N5)⎥ ⎢ f2 − f5 ⎥
⎢ ⎥ ⎢ ⎥
∇f =⎢ ⎥ ⎢
⎢f (N3) − f (N6)⎥ = ⎢f3 − f6 ⎥
⎥ (15.1)
⎢f (N4) − f (N7)⎥ ⎢ f4 − f7 ⎥
⎢ ⎥ ⎢ ⎥
⎢f (N5) − f (N8)⎥ ⎢ f5 − f8 ⎥
⎢ ⎥ ⎢ ⎥
⎣f (N2) − f (N7)⎦ ⎣ f1 − f7 ⎦
f (N1) − f (N7) f2 − f7

where we identify f (N1) ≡ f1 and so forth. It is easy to see that this is a measure of
flow through the graph. We then define the Laplacian ∇ 2 f = KK T f for this DG as
shown in Eq. 15.2.
15.1 The OCOS DAG 279
⎡ ⎤
2f1 − f2 − f7
⎢−f1 + 5f2 − f3 − f4 − f5 − f7 ⎥
⎢ ⎥
⎢ −f2 + 2f3 − f6 ⎥
⎢ ⎥
⎢ −f2 + 2f4 − f7 ⎥
∇2 f = ⎢


⎥ (15.2)
⎢ −f2 + 2f5 − f8 ⎥
⎢ −f3 + f6 ⎥
⎢ ⎥
⎣ −f1 − f2 − f4 + 3f7 ⎦
−f5 + f8

Since KK T is symmetric, the eigenvalues of this Laplacian are positive and there is
nice eigenvector structure. The standard cable equation for the voltage v across a
membrane is
∂v
λ2 ∇ 2 v − τ − v = −r λ2 k
∂t

where the constant λ is the space constant, τ is the time constant and r is a geometry
independent constant. The variable k is an input source. We will assume the flow of
information through our OCOS DG is given by the graph based partial differential
equation

τOCOS ∂f 1
∇OCOS
2
f − − 2 f = −rk.
λ2OCOS ∂t λOCOS

Now, we relabel the fraction τOCOS /λ2OCOS by μ1 and the constant 1/λ2OCOS by μ2
where we drop the OCOS label as it is understood from context. Each computational
DG will have constants μ1 and μ2 associated with it and if it is important to distinguish
them, we can always add the labellings at that time. This equation gives us the
Laplacian graph based information flow model. This gives, using a finite difference
for the ∂f
∂t
term, Eq. 15.3, where we define the finite difference fn (t) as fn (t + 1) −
fn (t).
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2f1 − f2 − f7 f1 (t) f1 (t) k1 (t)
⎢−f1 + 5f2 − f3 − f4 − f5 − f7 ⎥ ⎢f2 (t)⎥ ⎢f2 (t)⎥ ⎢k2 (t)⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ −f + 2f − f ⎥ ⎢ f (t) ⎥ ⎢ f (t) ⎥ ⎢k3 (t)⎥
⎢ 2 3 6 ⎥ ⎢ 3 ⎥ ⎢3 ⎥ ⎢ ⎥
⎢ −f2 + 2f4 − f7 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ − μ1 ⎢f4 (t)⎥ = μ2 ⎢f4 (t)⎥ − r ⎢k4 (t)⎥
⎢ −f + 2f − f ⎥ ⎢ f (t) ⎥ ⎢ f (t) ⎥ ⎢ k (t) ⎥
⎢ 2 5 8 ⎥ ⎢ 5 ⎥ ⎢5 ⎥ ⎢ 5 ⎥
⎢ −f + f ⎥ ⎢ f (t) ⎥ ⎢ f (t) ⎥ ⎢ k (t) ⎥
⎢ 3 6 ⎥ ⎢ 6 ⎥ ⎢6 ⎥ ⎢ 6 ⎥
⎣ −f1 − f2 − f4 + 3f7 ⎦ ⎣f7 (t)⎦ ⎣f7 (t)⎦ ⎣k7 (t)⎦
−f5 + f8 f8 (t) f8 (t) k8 (t)
(15.3)

Then, if we assume that we want the node values f (N1), f (N2) and f (N3) to be
clamped to A, B and C respectively for a given input k, we find after substitution into
Eq. 15.3, we obtain Eq. 15.4.
280 15 Training the Model
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2A − B − f7 0 A k1 (t)
⎢−A + 5B − C − f4 − f5 − f7 ⎥ ⎢ 0 ⎥ ⎢ B ⎥ ⎢k2 (t)⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ −B + 2C − f6 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢k3 (t)⎥
⎢ ⎥ ⎢ 0 ⎥ ⎢ C ⎥ ⎢ ⎥
⎢ −B + 2f4 − f7 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ − μ1 ⎢f4 (t)⎥ = μ2 ⎢f4 (t)⎥ − r ⎢k4 (t)⎥
⎢ −B + 2f5 − f8 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢f5 (t)⎥ ⎢f5 (t)⎥ ⎢k5 (t)⎥
⎢ −C + f6 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢f6 (t)⎥ ⎢f6 (t)⎥ ⎢k6 (t)⎥
⎣ −A − B − f4 + 3f7 ⎦ ⎣f7 (t) ⎦ ⎣f7 (t) ⎦ ⎣ k7 (t)⎦
−f5 + f8 f8 (t) f8 (t) k8 (t)
(15.4)

This gives us an iterative equation to solve for f4 through f8 for each input k. We adjust
parameters in the OCOS DG using Hebbian ideas. We can perform a similar Lapla-
cian flow and Hebbian update strategy on any DG. In particular, you can work out
the details of this process applied to the DG of Fig. 5.22b and the Layer Six–Four
Connections to Layer Two–Three in Fig. 5.23a and the Combined OCOS/ FFP model
of Fig. 5.24.

15.1.1 Some MatLab Comments

For the OCOS DG we developed in Chap. 18, we found

Listing 15.1: The OCOS Laplacian


L = l a p l a c i a n (OCOS)
L =
2 −1 0 0 0 0 −1 0
−1 5 −1 −1 −1 0 −1 0
0 −1 2 0 0 −1 0 0
0 −1 0 2 0 0 −1 0
0 −1 0 0 2 0 0 −1
0 0 −1 0 0 1 0 0
−1 −1 0 −1 0 0 3 0
0 0 0 0 −1 0 0 1
f = [0.2; 3.0; 1.0; −3.4; 4.6; 7.8; 1.2; 0.4];

and so for a given function vector

Listing 15.2: Sample Node function vector


f = [0.2; 3.0; 1.0; −3.4; 4.6; 7.8; 1.2; 0.4];

we can easily compute ∇ f and ∇ 2 f .


15.2 Homework 281

Listing 15.3: Computing the graph gradient and Laplacian


g r a d f = K’ ∗ f
gradf =
−2.8000
2.0000
6.4000
−1.6000
−6.8000
−4.6000
4.2000
1.8000
−1.0000
l a p l a c i a n f = K∗K’ ∗ f
laplacianf =
−3.8000
11.4000
−8.8000
−11.0000
5.8000
6.8000
3.8000
−4.2000

15.2 Homework

Exercise 15.2.1 Develop the Laplacian Training method equations for the The
Folded Feedback Pathway DG as shown in Fig. 5.22b.

Exercise 15.2.2 Develop the Laplacian Training method equations for the Layer
Six–Four Connections to Layer Two–Three in Fig. 5.23a.

Exercise 15.2.3 Develop the Laplacian Training method equations for the Com-
bined OCOS/FFP model of Fig. 5.24.

Exercise 15.2.4 Develop the Laplacian Training method equations for the general
OCOS/FFP Cortical Model as shown in Fig. 5.24. Compare the eigenvalue and eigen-
vector structure of the component Combined OCOS/FFP models and Layer Six–Four
Connections to Layer Two–Three DAGs that comprise this model to the eigenvalues
and eigenvectors of the whole model.

15.3 Final Comments

In this chapter, we begin the development of a software architecture that would be


capable of subserving autonomous decision making. It is clear an interesting small
brain architecture would have the following components.
• At least two primary sensory cortex modules. When we combine the OCOS, FFP
and 2/3 building blocks into a cortical stack such as shown in Fig. 5.24, this will
282 15 Training the Model

require about 30 computational nodes to implement. If we adjoin cortical stacks


into a n × m grid, we have a model of sensory cortex that requires 30nm nodes.
Let’s look at a few specific architectures.
1. a 3 × 3 grid implying 270 nodes per sensory cortex model.
2. a 3 × 2 grid implying 180 nodes per sensory cortex model.
3. a 3 × 1 grid implying 90 nodes per sensory cortex model.
4. a 2 × 1 grid implying 60 nodes per sensory cortex model.
5. a 1 × 1 grid implying 30 nodes per sensory cortex model.
Now each of these cortex stack models have modulatory connections to a thalamus,
midbrain and cerebellum components.
• A cortical stack model for motor cortex which will provide our output commands.
• A cortical stack model for associative cortex which will give us the sensor fusion
we need to drive the motor cortex.
• Simple thalamus, midbrain and cerebellum models which we will discuss in later
books.
The small brain model we build then has the fixed directed graph architecture we
obtain by building the graph using various neural circuit components such as the
OCOS and FFP. There are similar circuit modules we can use to build the thalamus,
midbrain and cerebellum models. For example, there are a number of books that
discuss theoretical models of brain function carefully. We have used these extensively
in our development work. In the subsequent volumes of this multi-volume work we
will be discussing this in detail, but for now, we want to mention this source material.
• General theories of brain evolution give many clues to help us design small brain
models. A great reference is Striedter’s book (Striedter 2004).
• Another source of information on the development of the nervous system in general
is the four volume series by Kaas and Bullock (2007a, b), Kaas and Krubitzer
(2007) and Kaas and Preuss (2007).
• A general theory of how the thalamus might work is given by Sherman and Guillery
(2006).
• The brain in a higher level organism such as ourselves is fundamentally asymmet-
ric which has profound software and wetware architectural ramifications. This is
discussed in Hellige (1993).
• There are vast modulatory effects due to hormone modulation which has a big effect
on the plasticity or rewiring of our architectures. There is a good introduction to
this material in Garcia-Segura (2009).
• The principles of frontal lobe function are discussed in the Stuss and Knight edited
volume (Stuss and Knight 2002).
In Table 15.1, we show various computational node counts for the small brain
models we are proposing.
We know that C. Elegans has approximately 300 neurons and 9000 connections
so the last two proposed architectures are similar in size. Many small brain ani-
mals function quite well without long term memory storage, so we don’t include
15.3 Final Comments 283

Table 15.1 Computational node count for small brain models


Sense I Sense II Associative Motor Thalamus Mid Cere- Total
brain bellum
(270) 3 × 3 (270) 3 × 3 (270) 3 × 3 (270) 3 × 3 100 100 100 1380
(180) 3 × 2 (180) 3 × 2 (180) 3 × 2 (180) 3 × 2 100 100 100 1020
(90) 3 × 1 (90) 3 × 1 (90) 3 × 1 (90) 3 × 1 100 100 100 660
(60) 2 × 3 (60) 2 × 1 (60) 2 × 3 (60) 2 × 3 100 100 100 540
(30) 1 × 1 (30) 1 × 1 (30) 1 × 1 (30) 1 × 3 100 100 100 320
(30) 1 × 1 (30) 1 × 1 (30) 1 × 1 (30) 1 × 3 30 30 30 210

a memory module at this time. We could test the ability of an architecture like this
to be autonomous within a virtual world. In this book, we program in the interpreted
language MatLab, but for models of this size we will need to recode in a language
such as C or C++. It is easy to see that organizing our graph architectures into
objects would be very useful. Hence, subsequent volumes, we introduce object ori-
ented programming so that we can rewrite our ideas into that kind of code. Our
design criteria at this stage are to see what kind of autonomy we can achieve in
our small brain models of 300 neurons or less. The power requirements for such an
architecture are reasonable for on board loading into an autonomous robot such as
those fielded for Mars exploration. In addition, we want our architectures to support
second messenger activity through a subset of neurotransmitter modulation using the
thalamus–midbrain–cerebellum loop. Hence, the small brain model will be useful for
software based lesion studies for depression and schizophrenia. To make it easier to
see how our models are performing, the first step builds an avatar which is a character
in a 3D virtual world such as we can build using Crystal Space, (Tybergheim 2003).
The avatar then interacts with the virtual world using commands generated in our
motor cortex module due to associative cortex processing of environmental inputs.
A reasonable visualization would require a 15 frame per second (fps) rate; hence,
the upper bound on computation time in our model is 1/15 s which is well within our
computational capabilities if we abstract from the bioinformation processing details
discussed in this volume simplified computational strategies.
We can assume the edge and node structure of our model is static—certainly,
plasticity is possible but we will ignore that for now. We have a model of how to
process information flow through the graph using the graph Laplacian based cable
equation. It is then clear that we have to decide how to process information flowing on
the edges and in the nodes themselves. Hence, each graph model will have associated
edge and node functions which can capture as little or as much of the underlying
biology as we want. The desired 15 fps rate will help us decide on our approximation
strategies. In the chapters to follow, we will finish this book with an abstraction
of second messenger systems, Chap. 6, and an approximate way to model the Ca++
current injections into the cytosol which form the background of a second messenger
trigger event, Chap. 7.
284 15 Training the Model

References

L. Garcia-Segura, Hormones and Brain Plasticity (Oxford University Press, Oxford, 2009)
J. Hellige, Hemispheric Asymmetry: What’s Right and What’s Left (Hardvard University Press,
Cambridge, 1993)
J. Kaas, T. Bullock (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 1: Theories, Development, Invertebrates) (Academic Press Elsevier, Amsterdam,
2007a)
J. Kaas, T. Bullock (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 2: Non-Mammalian Vertebrates) (Academic Press Elsevier, Amsterdam, 2007b)
J. Kaas, L. Krubitzer (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 3: Mammals) (Academic Press Elsevier, Amsterdam, 2007)
J. Kaas, T. Preuss, (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 4: Primates) (Academic Press Elsevier, Amsterdam, 2007)
R. Raizada, S. Grossberg, Towards a theory of the laminar architecture of cerebral cortex: Compu-
tational clues from the visual system. Cereb. Cortex 13, 100–113 (2003)
S. Murray Sherman, R. Guillery, Exploring The Thalamus and Its Role in Cortical Function (The
MIT Press, Cambridge, 2006)
G. Striedter, Principles of Brain Evolution (Sinauer Associates, Sunderland, 2004)
D. Stuss, R. Knight (eds.), Principles of Frontal Lobe Function (Oxford University Press, Oxford,
2002)
J. Tybergheim Crystal Space 3D Game Development Kit. (Open Source, 2003), http://www.
sourceforge.net
Part V
Simple Abstract Neurons
Chapter 16
Matrix Feed Forward Networks

We now begin to explore strategies for updating the graph models based on expe-
rience. Let’s begin with a very simple graph model called the Matrix Feedforward
Networks or MFFNs model. Instead of using graph based code, we do everything
using matrices and vectors. This graph model is based on information moving for-
ward through the network—hence, the adjective feed forward.

16.1 Introduction

Assuming we have a collection of inputs we want our model to map to desired


outputs, each input is fed forward through the model to an output. The discrepancy
between the output the model gives and the desired output is the error. Since the
model has no feedback in it, we can calculate (albeit with some pain!) the partial
derivatives of the resulting error with respect to all the free parameters in the model
and use gradient descent to try to minimize the collection of errors we obtain from
using all the inputs. The gradient descent technique used here is historically called
backpropagation in the literature. Later, we will rewrite this architecture using more
general notation which is called the chained feed forward architecture which itself is
simply a special case of our general directed graph architectures for brain modeling.
However, it is very instructive to look at the MFFN as it shows us in detail the
difficulties we will have mapping inputs to outputs in a general graph. We consider
a function

F : R n 0 → R n M+1 (16.1)

that has a special nonlinear structure. The nonlinearities in the FFN are contained in
the neurons which are typically modeled by sigmoid functions,
 
σ(y) = 0.5 1 + tanh(y) , (16.2)

© Springer Science+Business Media Singapore 2016 287


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_16
288 16 Matrix Feed Forward Networks

typically evaluated at

x −o
y= , (16.3)
g

where x is the input, o is the offset and g is the gain, respectively, of each neuron.
We can also write the transfer functions in a more general fashion as follows:
  
x −o
σ(x, o, g) = 0.5 1.0 + tanh
g

The feed forward network consists of M + 1 layers of neurons connected together


with connection coefficients. For each i, 0 ≤ i ≤ M + 1 − 1, the ith layer of n i
neurons is connected to the i + 1st layer of n i+1 neurons by a connection matrix, T i .
Schematically, we will use the following notation for an MFFN:

T0 TM
(16.4)
n 0 → n 1 → . . . → n M → n M+1

We can make this notation even more succinct by simply using

n 0 → n 1 → . . . → n M → n M+1 ; (16.5)

or even more simply, we can just refer to a n 0 n 1 · · · n M+1 MFFN. we will use either
one of these notations from this point on. The feed forward network processes an
arbitrary I ∈ R n 0 in the following way. First, for 0 ≤  ≤ M + 1 and 1 ≤ i ≤ n 
and layer , we define some additional terms to help us organize our work. We let

σi denote the trans f er function, Yi denote the output,


yi denote the input, Oi denotethe offset, (16.6)
  yi −Oi
gi denote the gain for the ith neuron and X i denote g
.
i

Then, given x ∈ R n 0 , for 0 ≤ i ≤ M, we have the inputs are processed by the zeroth
layer as follows.

yi0 = xi (16.7)
X i0 = (yi0 − Oi0 )/gi0 (16.8)
Yi0 = σi0 (X i0 ) (16.9)

For the next layer, more interesting things happen.


16.1 Introduction 289

1 −1
n
yi1 = T ji0 Y j0 (16.10)
j=1

X i1 = (yi1 − Oi1 )/gi1 (16.11)


Yi1 = σi1 (X i1 ) (16.12)

We continue in this way. At the layer  + 1, we have

 −1
n
yi+1 = T ji Y j (16.13)
j=1

X i+1 = (yi+1 − Oi+1 )/gi+1 (16.14)


Yi+1 = σi+1 (X i+1 ) (16.15)

So the output from the MFFN at layer M + 1 is given by

n M+1 −1

yiM+1 = T jiM Y jM (16.16)
j=1

X iM+1 = (yiM+1 − OiM+1 )/giM+1 (16.17)


YiM+1 = σiM+1 (X iM+1 ) (16.18)

With this notation, for any I ∈ R n 0 , we know how to calculate the output from the
MFFN. It is clearly a process that flows forward from the input layer and it is very
nonlinear in general even though it is built out of relatively simple layers of nonlin-
earities. The parameters that control the value of F(I) are the n i n i+1 coefficients of
T i , 0 ≤ i ≤ M; the offsets O i , 0 ≤ i ≤ M + 1; and the gains g i , 0 ≤ i ≤ M + 1.
This gives the number of parameters,


M 
M+1
N= n i n i+1 + 2 ni . (16.19)
i=0 i=0

Now let
 
I = I α ∈ R n0 : 0 ≤ α ≤ S − 1

and
 
D= Dα ∈ R n M+1 : 0 ≤ α ≤ S − 1

be two given sets of data of size S > 0. The set I is referred to as the set of exemplars
and the set D is the set of outputs that are associated with exemplars. Together, the
290 16 Matrix Feed Forward Networks

sets I and D comprise what is known as the training set. The training problem is
to choose the N network parameters, T 0 , . . . , T M , O 0 , . . . , O M+1 , g 0 , . . . , g M+1 ,
such that we minimize,


S−1
E = 0.5  F(I α ) − Dα 22 , (16.20)
α=0
M+1−1  2

S−1 n 
= 0.5 YαiM+1 − Dαi ,
α=0 i=0

where the subscript notation α indicates that the terms defined by (16.6) correspond
to the αth exemplar in the sets I and D.

16.2 Minimizing the MFFN Energy

We will use the notation established in Sect. 16.1. If all the partial derivatives of
the feed forward energy function (16.20) with respect to the network parameters
were known, we could minimize (16.20) using a standard gradient descent scheme.
This is known as the MFFN Training Problem. Relabeling the N MFFN variables
temporarily as wi , 0 ≤ i ≤ N − 1, the optimization problem we face is
 
min E I, D, w
(16.21)
w∈R N

where we now explicitly indicate that the value of E depends on I, D and the
parameters w. In component form, the equations for gradient descent are then

∂ E 
winew = wiold − λ (16.22)
∂wi wold

where λ is a scalar parameter that must be chosen efficiently in order for the gradient
descent method to work. The above technique is the heart of the back propagation
technique which is currently used in many cases to solve the MFFN training problem.
The MFFN back propagation equations will now be derived. For further information,
see the discussions in Rumelhart and McClelland (1986).
16.3 Partial Calculations for the MFFN 291

16.3 Partial Calculations for the MFFN

We need an algorithm to find the gradient of the MFFN error function so that we can
apply gradient descent.

16.3.1 The Last Hidden Layer

First, we’ll look at the weight update equations for the weights between the last
hidden layer and the output layer. Using the chain rule, we have

S−1 
  M+1 
∂E ∂E ∂ X βq
= (16.23)
∂T pq
M
∂ X βq
M+1 ∂T pq
M
β=0

Now let eαM+1


j = YαM+1
j − Dα j . Using our rules for computing the output of a node,
we then obtain

 n M+1 −1
  
∂E
S−1
∂YαM+1
j
= eαM+1
j ,
∂ X βq
M+1
α=0 j=0
∂ X βq
M+1

 n M+1 −1
     
S−1
∂ X αM+1
j
= eαM+1
j σj
M+1 M+1
Xα j ,
α=0 j=0
∂ X βq
M+1

 −1
S−1 n M+1    
= eαM+1
j σ M+1
j X M+1
αj δβα δqj ,
α=0 j=0
   
= M+1
eβq σqM+1 M+1
X βq . (16.24)

 
∂E
where δβα is the standard kronecker delta function. Letting ξβq
M+1
= ∂ X βq
M+1 we
have,
   
ξβq
M+1
= M+1
eβq σq
M+1 M+1
X βq ,

  M+1 
∂E
S−1
∂ X βq
= ξ M+1
βq ,
∂T pq
M ∂T pq
M
β=0
n M −1 M M
l=0 Tlq Yβl − Oq
M+1
X βq =
M+1
. (16.25)
gqM+1
292 16 Matrix Feed Forward Networks

Since the offset and gain terms are independent of the choice of the input set index
β, the subscripts β are unnecessary on those terms and are not shown. Taking the
indicated partial, we have
 
n M −1 ∂TlqM
YβlM
∂ X βq
M+1 l=0 ∂T pq
M

= ,
∂T pq
M
gqM+1
YβMp
= . (16.26)
gqM+1

Using (16.26), we can then rewrite (16.25) as


   
ξβq
M+1
= eβq
M+1
σqM+1 M+1
X βq ,
 S−1
β=0 ξβq Yβ p
M+1 M
∂E
= . (16.27)
∂T pq
M
gqM+1

The gain and offset terms can also be updated. The appropriate equations follow
below.
S−1 
  
∂E ∂E ∂ X βq
M+1
=
∂ OqM+1 β=0
∂ X βq
M+1
∂ OqM+1
 S−1
β=0 ξβq
M+1
=−
gqM+1
S−1 
  
∂E ∂E ∂ X βq
M+1
=
∂gqM+1 β=0
∂ X βq
M+1
∂gqM+1
 S−1
β=0 ξβq (yβq − OqM+1 )
M+1 M+1
=− (16.28)
(gqM+1 )2

The back propagation equations for the last hidden layer to the output layer are thus
given by Eqs. (16.27) and (16.28).

16.3.2 The Remaining Hidden Layers

We now derive the back propagation equations for the remaining layers. Consider

S−1 
  M 
∂E ∂E ∂ X βq
= (16.29)
∂T pq
M−1
β=0
∂ X βq
M
∂T pq
M−1
16.3 Partial Calculations for the MFFN 293

∂E
Defining ξβq
M
= M ,
∂ X βq
we obtain

 n M+1 −1
  
S−1
∂YαM+1
j
ξβq
M
= eαM+1
j
α=0 j=0
∂ X βq
M

 n M+1 −1
     
S−1
∂ X αM+1
j
= eαM+1
j σ M+1
j X M+1
αj (16.30)
α=0 j=0
∂ X βq
M

we then calculate
 
n M −1 ∂Y M
Tl M ( ∂ XαlM )
∂ X αM+1
j
l=0 j βq
= ,
∂ X βq
M
g M+1
j
   
n M −1 ∂ X αl
M

l=0 Tl M
j σl
M
(X αl ) ( ∂ X M )
M
βq
= ,
g M+1
j
   
n M −1
l=0 Tl M
j σlM (X αl
M
) δβα δql
= ,
g M+1
j
 
TqMj σqM (X βq
M
) δαβ
= , (16.31)
g M+1
j

Hence,

 −1
S−1 n M+1     
∂ X αM+1
j
ξβq
M
= eαM+1
j σ M+1
j X M+1
αj
α=0 j=0
∂ X βq
M

 

S−1 n M+1−1     Tq j σqM (X βq
M M
) δαβ
= M+1
eα j σjM+1 M+1
Xα j
α=0 j=0
g M+1
j
 
n M+1 −1
     TqMj σqM (X βq M
)
= M+1
eβ j σj M+1 M+1
Xβ j (16.32)
j=0
g M+1
j
294 16 Matrix Feed Forward Networks

From (16.28), it the follows that


n M+1
−1  
ξβM+1 M
j Tq j
ξβq
M
= σqM (X βq
M
) (16.33)
j=0
g M+1
j

Equation (16.33) defines a recursive method of computing the ξβqM


in terms of the
previous layer ξβ j . The calculations for the next layer of partial derivatives is
M+1

similar:
 
∂ yβq
M

∂ X βq
M ∂T pq
M−1

= ,
∂T pq
M−1 gqM
 
n M−1 −1 ∂TlqM−1
l=0 ∂T pq
M−1 YβlM−1
= ,
gqM
YβM−1
p
= . (16.34)
gqM

Substituting (16.34) into (16.29), we find

n M+1 −1
 ξβM+1 M
j Tq j
ξβq
M
= (σqM ) (X βq
M
)
j=0
g M+1
j
 S−1
∂E β=0 ξβq
M M−1
Yβ p
= (16.35)
∂T pq
M−1 gqM

The update equations for the gain and offset terms are given below.

S−1 
  M 
∂E ∂E ∂ X βq
=
∂ OqM ∂ X βq
M ∂ OqM
β=0
 S−1 M
β=0 ξβq
=−
gqM
S−1 
  M 
∂E ∂E ∂ X βq
=
∂gqM ∂ X βq
M ∂gqM
β=0
 S−1 M M
β=0 ξβq (yβq − Oq )
M
=− (16.36)
(gqM )2
16.4 The Full Backpropagation Equations for the MFFN 295

16.4 The Full Backpropagation Equations for the MFFN

The update equations for the next back propagation step are thus given by Eqs. (16.35)
and (16.36). Equations (16.27), (16.28), (16.35) and (16.36) give the back propaga-
tion equations for the last two layers. The back propagation equations between any
two interior layers, including the equations governing the back propagation between
the first hidden layer and the input layer, can now be derived using induction. We
obtain for all layer indices l, 0 ≤ l ≤ M:
   
ξβq
M+1
= M+1
eβq σq
M+1 M+1
X βq ,
n M−l+1 −1
  
ξβM−l+1
j TqM−l
j
ξβq
M−l
= σq
M−1
(X βq
M−l
),
j=0
g M−l+1
j
 S−1
∂E β=0 ξβq
M−l+1 M−l
Yβ p
= ,
∂T pq
M−l
gqM−l+1
 S−1
β=0 ξβq
M−l+1
∂E
=− ,
∂ OqM−l+1 gqM−l+1
 S−1 M−l+1 M−l+1
∂E β=0 ξβq (yβq − OqM−l+1 )
= − . (16.37)
∂gqM−l+1 (gqM−l+1 )2

Note that at l = M, we need to evaluate X βq 0


. This evaluation will depend on the
choice of transfer function that is used for the input layer. It is actually common to
use a linear transfer function for the input layer with a zero offset and unit gain. In
other words, for a linear transfer function, Oβq 0
= 0.0, gβq
0
= 1.0, Yβq 0
= yβq
0
and
 
σq0 (X βq
0
) = 1.0. For this case, l = M, we have

2 −1
n  
ξβ2 j Tq1j
ξβq
1
= σq1 (X βq
1
),
j=0
g 2j
 S−1
∂E β=0 ξβq
1
Yβ0p
= ,
∂T pq
0 gq1
 S−1
β=0 ξβq
1
∂E
=−
∂ Oq1 gq1
 S−1 1 1
∂E β=0 ξβq (yβq − Oq1 )
= − . (16.38)
∂gq1 (gq1 )2
296 16 Matrix Feed Forward Networks

There are no connection coefficients before T 0 and so to derive the update equations
for the gains and offsets for the input layer, we need to step back a bit. From our
derivations, we know
∂E 
S−1
∂ X βq
0
= ξ 0
βq
∂ Oq0 ∂ Oq0
β=0
 S−1 0
β=0 ξβq
=− (16.39)
gg0
∂E 
S−1
∂ X βq
0
= ξ 0
βq
∂gq0 ∂gq0
β=0
 S−1 0
β=0 ξβq (yq − Oq )
0 0
=− . (16.40)
(gg0 )2

Hence, if we want to have parameters in the input layer tuned, we can do so with
Eqs. (16.39) and (16.40). Equations (16.37), (16.39) and (16.40) gives the complete
set of back propagation equations for this type of neural network architecture. If
the parameters of the MFFN’s input layer are not mutable, we only need to use
Eq. (16.37), of course.

16.5 A Three Layer Example

Let’s go through all of the forbidding equations above for a three layer network
which uses the same sigmoidal transfer function for all neurons in the hidden and
output layer and uses some other common transfer function for the input layer. We
assume all the transfer functions are of type σ. In the notation we used before, we
have M = 1 so that the output layer is M + 1 = 2 and the input layer is M − 1 = 0.
Let’s also assume we have n 0 neurons in the input layer, n 1 neurons in the hidden
layer and n 2 neurons in the output layer. Our equations become:

16.5.1 The Output Layer


2
yβq − Oq2
2
X βq =
gq2
2
eβq = Yβq
2
− Dβq
16.5 A Three Layer Example 297
 
ξβq
2
= eβq
2
σ  X βq
2
,
 S−1
β=0 ξβq Yβ p
2 1
∂E
= ,
∂T pq
1 gq2
 S−1 2
∂E β=0 ξβq
= −
∂ Oq2 gq2
 S−1 2 2
∂E β=0 ξβq (yβq − Oq2 )
= − (16.41)
∂gq2 (gq2 )2

16.5.2 The Hidden Layer


1
yβq − Oq1
1
X βq =
gq1
n
2 −1   
ξβ2 j Tq1j
ξβq
1
= σq1 (X βq
1
)
j=0
g 2j
 S−1
∂E β=0 ξβq
1
Yβ0p
=
∂T pq
0 gq1
 S−1
β=0 ξβq
1
∂E
=−
∂ Oq1 gq1
 S−1 1 1
∂E β=0 ξβq (yβq − Oq1 )
= − (16.42)
∂gq1 (gq1 )2

16.5.3 The Input Layer


1 −1
n  
ξβ1 j Tq0j
ξβq
0
= σq0 (X βq
0
),
j=0
g 1j
 S−1
β=0 ξβq
0
∂E
=− ,
∂ Oq0 gq0
 S−1 0
∂E β=0 ξβq (x βq − Oq0 )
= − . (16.43)
∂gq0 (gq0 )2

where xβq is the input into the qth input neuron for sample β.
298 16 Matrix Feed Forward Networks

16.6 A MatLab Beginning

Let’s get started at coding a MatLab implementation of the MFFN. We will not be
clever yet and instead code up a quick and dirty example for a specific 2-3-1 MFFN.
Let’s begin with some initializations and the code for calculating the summed squared
error over the data set. We will assume each piece of data is sent through the 2-3-1
MFFN in a function myffneval which will be written later. Using the evaluation
function and the initializations, we can then calculate the summed squared error over
the data. Consider the following code for the 2-3-1 initialization part. The function
MyFFN assumes there is some data stored in Data which consists of triples of
the form (x, y, z). The built in function rand is very easy to use. The command
rand(2,3) returns random numbers in (0, 1) for all the entries of a 2 × 3 matrix
and so the line -1+2*rand(2,3) sets up a random matrix with entries in (−1, 1).
We use this approach to set up a random matrix for our T 0 and T 1 matrices which
will be called T1 and T2 in MatLab because the numbering in MatLab starts at 1
instead of the 0 we use in our derivations. We also set up random vectors of the
offsets and gains which we call O and G.
We also set up our nodal processing functions. We need 6 of them here, so we set
them up in a cell data structure nodefunction{i} as follows;

Listing 16.1: Initializing the nodefunctions for the 2-3-1 MFFN


s i g m a i n p u t = @( x , o , g ) ( x−o ) / g ;
s i g m a = @( x , o , g ) 0 . 5 ∗ ( 1 + tanh ( ( x−o ) / g ) ) ;
% a s s i g n t h e node f u n c t i o n s
n o d e f u n c {1} = s i g m a i n p u t ;
5 n o d e f u n c {2} = s i g m a i n p u t ;
n o d e f u n c {3} = s i g m a ;
n o d e f u n c {4} = s i g m a ;
n o d e f u n c {5} = s i g m a ;
n o d e f u n c {6} = s i g m a ;

This makes it easy for us to refer to nodal processing functions by their node number.
Next, we set up the loop over the data triples which enables us to calculate the summed
squared error. We store the outputs of the nodes as Y, the raw errors as e = Y - Z
and the squared errors as e^2. We then use this information to sum the squared
errors over the data set and compute the error function sumE.
16.6 A MatLab Beginning 299

Listing 16.2: The error loop for the 2-3-1 MFFN


% s e t u p t h e MFFN e v a l u a t i o n l o o p f o r t h i s d a t a
Y = [];
e = [];
E = [];
5 f o r i = 1 :N
Y( i ) = m y f f n e v a l ( X1 ( i ) , X2 ( i ) , Z ( i ) , T1 , T2 , O f f , Gain , n o d e f u n c ) ;
e ( i ) = Y( i ) − Z ( i ) ;
E( i ) = e ( i ) ˆ 2 ;
end
10 sumE = 0 . 5 ∗ sum (E) ;
end

All of this is put together in the function MyFFN below.


Listing 16.3: Initializing and finding the error for a 2-3-1 MFFN

f u n c t i o n [ sumE , e , E , Y, Z ] = MyFFN( Data )


%
% Sample FFN c o d e f o r a s p e c i f i c 2−3−1 MFFN
%
5 % Data i s t r i p l e s ( X1 , X2 , Z )
%
N = l e n g t h ( Data ) ;
X1 = Data ( : , 1 ) ;
X2 = Data ( : , 2 ) ;
10 Z = Data ( : , 3 ) ;
%
% S e t up w e i g h t m a t r i c e s
% T1 and T2 a r e r a n d o m l y i n i t i a l i z e d i n ( − 1 , 1 )
T1 = −1 + 2∗ rand ( 2 , 3 ) ;
15 T2 = −1 + 2∗ rand ( 3 , 1 ) ;
%
% S e t u p o f f s e t s and g a i n s
% O f f i s 6 x1 and Gain i s 6 x1 . I n i t i a l i z e r a n d o m l y
O f f = rand ( 6 , 1 ) ;
20 Gain = rand ( 6 , 1 ) ;
% Setup the p r o c e s s i n g f u n c t i o n s
s i g m a i n p u t = @( x , o , g ) ( x−o ) / g ;
s i g m a = @( x , o , g ) 0 . 5 ∗ ( 1 + tanh ( ( x−o ) / g ) ) ;
% a s s i g n t h e node f u n c t i o n s
25 n o d e f u n c {1} = s i g m a i n p u t ;
n o d e f u n c {2} = s i g m a i n p u t ;
n o d e f u n c {3} = s i gma ;
n o d e f u n c {4} = s i gma ;
n o d e f u n c {5} = s i gma ;
30 n o d e f u n c {6} = s i gma ;
%
% s e t u p t h e MFFN e v a l u a t i o n l o o p f o r t h i s d a t a
Y = [];
e = [];
35 E = [];
f o r i = 1 :N
Y( i ) = m y f f n e v a l ( X1 ( i ) , X2 ( i ) , Z ( i ) , T1 , T2 , O f f , Gain , n o d e f u n c ) ;
e ( i ) = Y( i ) − Z ( i ) ;
E( i ) = e ( i ) ˆ 2 ;
40 end
sumE = 0 . 5 ∗ sum (E) ;
end

To finish we need to write the 2-3-1 evaluation code. This is given in myffneval.
This code follows our algorithm closely and you should look at it carefully to make
sure you see that.
300 16 Matrix Feed Forward Networks

Listing 16.4: The 2-3-1 MFFN evaluation


f u n c t i o n [ Y6 ] = m y f f n e v a l ( x1 , x2 , z , T1 , T2 , O f f , Gain , n o d e f u n c )
%
% x1 , x2 , z i s t h e d a t a t r i p l e from Data
% T1 i s t h e 2 x3 w e i g h t m a t r i x b e t w e e n l a y e r 1 ( i n p u t l a y e r )
5 % and l a y e r 2 ( m i d d l e l a y e r )
% T2 i s t h e 3 x1 w e i g h t m a t r i x b e t w e e n l a y e r 2 ( m i d d l e l a y e r )
% and l a y e r 3 ( o u t p u t l a y e r )
% Off i s t h e o f f s e t v e c t o r
% Gain i n t h e g a i n v e c t o r
10 % nodefunc i s a c e l l of f u n c t i o n handles used f o r
% t h e node p r o c e s s i n g
%
% input layer processing
y1 = x1 ; Y1 = n o d e f u n c {1}( y1 , O f f ( 1 ) , Gain ( 1 ) ) ;
15 y2 = x2 ; Y2 = n o d e f u n c {2}( y2 , O f f ( 2 ) , Gain ( 2 ) ) ;
% middle l a y e r processing
y3 = T1 ( 1 , 1 ) ∗Y1 + T1 ( 2 , 1 ) ∗Y2 ; Y3 = n o d e f u n c {3}( y3 , O f f ( 3 ) , Gain ( 3 ) ) ;
y4 = T1 ( 1 , 2 ) ∗Y1 + T1 ( 2 , 2 ) ∗Y2 ; Y4 = n o d e f u n c {4}( y4 , O f f ( 4 ) , Gain ( 4 ) ) ;
y5 = T1 ( 1 , 3 ) ∗Y1 + T1 ( 2 , 3 ) ∗Y2 ; Y5 = n o d e f u n c {5}( y5 , O f f ( 5 ) , Gain ( 5 ) ) ;
20 % output layer processing
y6 = T2 ( 1 , 1 ) ∗Y3+T2 ( 2 , 1 ) ∗Y4+T2 ( 3 , 1 ) ∗Y5 ; Y6 = n o d e f u n c {6}( y6 , O f f ( 6 ) ,
Gain ( 6 ) ) ;

end

We can test this on some simple data. There are two inputs x1 and x2 and if 2x1 /x2 > 1,
the target value is 1 and otherwise, it is zero. We setup some data and run our codes
as follows:

Listing 16.5: Testing Our 2-3-1 Code


X1 = [ 0 . 3 ; 0 . 7 ; 1 . 4 ; 2 . 7 ] ;
2 X2 = [ 0 . 1 ; 1 . 3 ; 0 . 6 ; 7 . 2 ] ;
C = [ X1 , X2 , ( 2 ∗ X1 . / X2 ) ]
C =
0.3000 0.1000 6.0000
0.7000 1.3000 1.0769
7 1.4000 0.6000 4.6667
2.7000 7.2000 0.7500
% I t i s e a s y t o s e t t h e t a r g e t s now : t h e f i r s t t h r e e are 1
% and t h e l a s t one i s 0 .
Target = [ 1 ; 1 ; 1 ; 0 ] ;
12 Data = [ X1 , X2 , T a r g e t ]
Data =
0.3000 0.1000 1.0000
0.7000 1.3000 1.0000
1.4000 0.6000 1.0000
17 2.7000 7.2000 0
% now f i n d t h e summed s q u a r e d e r r o r
[ sumE , e , E , Y, Z ] = MyFFN( Data ) ;
% The Y( 6 ) node v a l u e s a r e r e t u r n e d f o r e a c h p i e c e
% o f d a t a and s t o r e d i n Y
22 Y
Y =
0.1128 0.0539 0.0449 0.0291
% The t a r g e t s a r e
Z
27 Z =
1
1
1
0
16.6 A MatLab Beginning 301

32 The s q u a r e d e r r o r s a r e
E
E =
0.7872 0.8951 0.9123 0.0008
% and t h e e r r o r i s
37 sumE
sumE =
1.2977

16.7 MatLab Implementations

Let’s write our first attempt at implementing a standard MFFN in MatLab. We are
going to be quite general in this implementation; the previous 2-3-1 example was just
a warmup. From studying our code there, we can see more clearly how the places
where we should write more generic code. But make no mistake; there are always
tradeoffs to any software implementation decision we make.
We will break our implementation into fairly standard pieces: an initialization
function, an evaluation function, an update function and finally a training function.

16.7.1 Initialization

Listing 16.6: Initialization code mffninit.m


1 f u n c t i o n [ G, O, T ] = m f f n i n i t (GL,GH, OL ,OH, TL , TH, L a y e r S i z e s )
%
% T h i s i s a P l a y e r MFFN w h e r e P i s t h e number
% of layers
% number o f n o d e s i n t h e i n p u t l a y e r 0 = L a y e r S i z e s ( 1 )
6 % number o f n o d e s i n t h e m i d d l e l a y e r 1 = L a y e r S i z e s ( 2 )
% ...
% number o f n o d e s i n t h e o u t p u t l a y e r = L a y e r S i z e s (P)
%
% GL i s t h e l o w e r bound f o r t h e g a i n s
11 % GH i s t h e u p p e r bound f o r t h e g a i n s
% OL i s t h e l o w e r bound f o r t h e o f f s e t s
% OH i s t h e u p p e r bound f o r t h e o f f s e t s
% TL i s t h e l o w e r bound f o r t h e e d g e w e i g h t s
% TH i s t h e u p p e r bound f o r t h e e d g e w e i g h t s
16 T = {};
O = {};
G = {};

[ P ,m] = s i z e ( L a y e r S i z e s ) ;
21 % i n i t i a l i z e parameters
f o r p=1:P−1
rows = L a y e r S i z e s ( p ) ;
c o l s = L a y e r S i z e s ( p+1) ;
T{p} = TL+(TH−TL) ∗ rand ( rows , c o l s ) ;
26 O{p} = OL+(OH−OL) ∗ rand ( 1 , rows ) ;
G{p} = GL+(GH−GL) ∗ rand ( 1 , rows ) ;
end
rows = L a y e r S i z e s (P) ;
O{P} = OL+(OH−OL) ∗ rand ( 1 , rows ) ;
31 G{P} = GL+(GH−GL) ∗ rand ( 1 , rows ) ;

end
302 16 Matrix Feed Forward Networks

16.7.2 Evaluation

Listing 16.7: Evaluation: mffneval2.m


f u n c t i o n [ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , Gain , O f f s e t
, T, La y e r S i z e s )
2 %
% This i s a P l a y e r MFFN w h e r e P i s t h e number
% of layers
% number o f nodes in the input layer 0 = LayerSizes (1)
% number o f nodes in the middle l a y e r 1 = LayerSizes (2)
7 % ...
% number o f nodes in the output l a y e r = L a y e r S i z e s (P)
%
% sigma i s t h e node t r a n s f e r f u n c t i o n
% Input i s i n p u t m a t r i x o f S rows o f i n p u t d a t a
12 % S x LayerSizes (1)
% t a r g e t i s d e s i r e d t a r g e t v a l u e m a t r i x o f S rows o f t a r g e t d a t a
% S x L a y e r S i z e s (P)
% n o d e f u n c t i o n i s t h e node t r a n s f e r f u n c t i o n c e l l d a t a
% n o d e f u n c t i o n p r i m e i s t h e node t r a n s f e r f u n c t i o n d e r i v a t i v e c e l l
data
17 sig ma = n o d e f u n c t i o n { 1 } ;
sigmainput = nodefunction {2};
sigmaoutput = nodefunction {3};

% E x t r a c t number o f l a y e r s
22 % so
% output layer = P
% input layer = 1
[ P ,m] = s i z e ( L a y e r S i z e s ) ;

27 % Assume I n p u t S i z e = T a r g e t S i z e ! !
%
% E x t r a c t t h e number o f t r a i n i n g s a m p l e s S
[ S , I n p u t S i z e ] = s i z e ( Input ) ;

32 % s e t up s t o r a g e f o r a l l MFFN v a l u e s
y = {};
Y = {};
f o r p = 1 :P
cols = LayerSizes (p) ;
37 y{p} = z e r o s ( S , c o l s ) ;
Y{p} = z e r o s ( S , c o l s ) ;
end

% s e t up s t o r a g e f o r b o t h raw e r r o r RE and l o c a l squared e r r o r LE


42 c o l s = L a y e r S i z e s (P) ;
LE = z e r o s ( S , c o l s ) ;
RE = z e r o s ( S , c o l s ) ;
Output = z e r o s ( S , c o l s ) ;

47 energy = 0 ;
f o r s =1:S
% l o o p on t r a i n i n g e x a m p l a r s
f o r p = 1 :P
i f p ==1
52 y{p } ( s , : ) = I n p u t ( s , : ) ;
f o r i =1: L a y e r S i z e s ( 1 )
i n p u t = y{p } ( s , i ) ;
o f f s e t = O f f s e t {p } ( i ) ;
g a i n = Gain{p } ( i ) ;
16.7 MatLab Implementations 303

57 Y{p } ( s , i ) = s i g m a i n p u t ( i n p u t , o f f s e t , g a i n ) ;
end
else
f o r i =1: L a y e r S i z e s ( p )
y{p } ( s , i ) = d o t (T{p − 1 } ( : , i ) ,Y{p −1}( s , : ) ) ;
62 i n p u t = y{p } ( s , i ) ;
o f f s e t = O f f s e t {p } ( i ) ;
g a i n = Gain{p } ( i ) ;
i f p ==P
Y{p } ( s , i ) = s i g m a o u t p u t ( i n p u t , o f f s e t , g a i n ) ;
67 else
Y{p } ( s , i ) = s ig ma ( i n p u t , o f f s e t , g a i n ) ;
end
end
end
72 end % p l o o p
Output ( s , : ) = Y{P} ( s , : ) ;
RE( s , : ) = Output ( s , : ) − T a r g e t ( s , : ) ;
LE ( s , : ) = RE( s , : ) . ˆ 2 ;
e n e r g y = e n e r g y + sum ( LE ( s , : ) ) ;
77 end % s l o o p

E = 0.5∗ energy ;

end

16.7.3 Updating

Now let’s consider the update process. We will follow the update algorithms we have
presented. Our node evaluation functions have the form of the σ listed below for
various values of L and H where we assume L < H , of course.
  
1 y−o
σ(y, o, g) = (H + L) + (H − L) tanh
2 g

The three partials are


  2
∂σ H−L y−o
= sech
∂y 2g g
  2
∂σ H−L y−o
=− sech
∂o 2g g
∂σ
=−
∂y
304 16 Matrix Feed Forward Networks

  2
∂σ H−L y−o
=− sech
∂o 2g 2 g
1 ∂σ
=− .
g ∂y

In the code for our updates, we will assume we have coded all of our σ  functions as
if the argument to the node function was simply u instead of the ratio y−og
. Hence,
we need the following derivative for all our node functions.

H−L
σ  (u) = l(sech(u))2 .
2
In code we would have

Listing 16.8: Node function initialization


OL = − 1 . 4 ;
OH = 1 . 4 ;
GL = 0 . 8 ;
4 GH = 1 . 2 ;
TL = − 2 . 1 ;
TH = 2 . 1 ;
nodefunction = {};
nodefunctionprime = {};
9 sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
n o d e f u n c t i o n {1} = si g m a ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
SL = − . 5 ;
14 SH = 1 . 5 ;
% o u p u t n o d e s r a n g e from SL t o SH
s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;
% t h e s e d e r i v a t i v e s are r e a l l y p a r t i a l sigma / p a r t i a l y
19 % s o a l l h a v e a 1/ g a i n b u i l t i n
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
24 s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t
) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;

Note nodefunction{1} is the main node processing function,


nodefunction{2} is the input node function and nodefunction{3} is the
output node function.
You see this used in the update code as follows
16.7 MatLab Implementations 305

Listing 16.9: Node function assignment in the update code

% main s i g m a
sig ma = n o d e f u n c t i o n { 1 } ;
% i n p u t sigma
sigmainput = nodefunction {2};
5 % o u t p u t sigma
sigmaoutput = nodefunction {3};
% main s i g m a p r i m e
sigmaprime = nodefunctionprime {1};
% i n p u t sigma prime
10 sigmainputprime = nodefunctionprime {2};
% o u t p u t sigma prime
sigmaoutputprime = nodefunctionprime {3};

Listing 16.10: Updating: mffnupdate.m


f u n c t i o n [ Gain , O f f s e t , T , l e n g t h ] = m f f n u p d a t e ( Input , Target ,
n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e , Gain , O f f s e t , T , L a y e r S i z e s , y , Y, lambda
)
%
3 % We t r a i n a P l a y e r MFFN t o math i n p u t t o t a r g e t s w h e r e P i s t h e
number
% of layers
% number o f n o d e s i n t h e i n p u t l a y e r 0 = L a y e r S i z e s ( 1 )
% number o f n o d e s i n t h e m i d d l e l a y e r 1 = L a y e r S i z e s ( 2 )
% ...
8 % number o f n o d e s i n t h e o u t p u t l a y e r = L a y e r S i z e s (P)
%
% I n p u t i s i n p u t m a t r i x o f S rows o f i n p u t d a t a
% S x LayerSizes (1)
% t a r g e t i s d e s i r e d t a r g e t v a l u e m a t r i x o f S rows o f t a r g e t d a t a
13 % S x L a y e r S i z e s (P)
%
% Gain i s t h e MFFN g a i n p a r a m e t e r s
% O f f s e t i s t h e MFFN o f f s e t p a r a m e t e r s
% T i s t h e MFFN e d g e c o e f f i c i e n t s
18 % y i s t h e node i n p u t d a t a
% Y i s t h e node o u t p u t d a t a
%
% n o d e f u n c t i o n i s t h e node t r a n s f e r f u n c t i o n c e l l d a t a
% n o d e f u n c t i o n p r i m e i s t h e node t r a n s f e r f u n c t i o n d e r i v a t i v e c e l l
data
23 sig ma = n o d e f u n c t i o n { 1 } ;
sigmainput = nodefunction {2};
sigmaoutput = nodefunction {3};
sigmaprime = nodefunctionprime {1};
sigmainputprime = nodefunctionprime {2};
28 sigmaoutputprime = nodefunctionprime {3};
%
% t u r n g a i n u p d a t e s on o r o f f : DOGAIN = 0
% t u r n s them o f f
DOGAIN = 0 ;
306 16 Matrix Feed Forward Networks

33
% E x t r a c t number o f l a y e r s
% so
% output layer = P
% input layer = 1
38 [ P ,m] = s i z e ( L a y e r S i z e s ) ;

%
% E x t r a c t t h e number o f t r a i n i n g samples S
[ S , I n p u t S i z e ] = s i z e ( Input ) ;
43
% setup storage for generalized errors
xi = {};
f o r p=P: − 1 : 1
cols = LayerSizes (p) ;
48 x i {p} = z e r o s ( S , c o l s ) ;
end

% setup storage for p a r t i a l d e r i v a t i v e s


DT = { } ;
53 DO = { } ;
DG = { } ;
f o r p=1:P−1
rows = L a y e r S i z e s ( p ) ;
c o l s = L a y e r S i z e s ( p+1) ;
58 DT{p} = z e r o s ( rows , c o l s ) ;
DO{p} = z e r o s ( 1 , rows ) ;
DG{p} = z e r o s ( 1 , rows ) ;
end
rows = L a y e r S i z e s (P) ;
63 DO{P} = z e r o s ( 1 , rows ) ;
DG{P} = z e r o s ( 1 , rows ) ;

% find output layer generalized error


f o r s =1:S
68 f o r i = 1 : L a y e r S i z e s (P)
e r r o r = Y{P} ( s , i ) − T a r g e t ( s , i ) ;
x i {P} ( s , i ) = e r r o r ∗ s i g m a o u t p u t p r i m e ( y{P} ( s , i ) , O f f s e t {P} ( i ) , Gain{
P} ( i ) ) ;
end
end
73
f o r p=P− 1 : − 1 : 1
for s = 1:S
% f i n d x i {p }( s , : )
rows = L a y e r S i z e s ( p ) ;
78 c o l s = L a y e r S i z e s ( p+1) ;
f o r q = 1 : rows
f o r j =1: c o l s
i f p == 1
f = s i g m a i n p u t p r i m e ( y{p } ( s , q ) , O f f s e t {p } ( q ) , Gain{p } ( q ) ) ;
83 else
f = s i g m a p r i m e ( y{p } ( s , q ) , O f f s e t {p } ( q ) , Gain{p } ( q ) ) ;
end
x i {p } ( s , q ) = x i {p } ( s , q ) + x i {p+1}( s , j ) ∗T{p } ( q , j ) ∗ f / Gain{
p+1}( j ) ;
end % i n n e r j node l o o p
16.7 MatLab Implementations 307

88 end % o u t e r q node l o o p
% f i n d DT{p } , DO{p +1} , DG{p+1}
f o r u = 1 : rows
for v = 1: cols
DT{p } ( u , v ) = DT{p } ( u , v ) + x i {p+1}( s , v ) ∗Y{p } ( s , u ) / Gain{p
+1}(v ) ;
93 end%v
end%u
f o r v =1: c o l s
DO{p+1}(v ) = DO{p+1}(v ) − x i {p+1}( s , v ) / Gain{p+1}(v ) ;
%DG{p +1}( v ) = DG{p +1}( v ) − x i {p +1}( s , v ) ∗( y {p +1}( s , v ) −
O f f s e t {p +1}( v ) ) / ( Gain{p +1}( v ) ˆ 2 ) ;
98 end% v l o o p
end % s l o o p
end % l a y e r l o o p
% now do i n p u t l a y e r
for s = 1:S
103 % f i n d DO{1} , DG{1}
cols = LayerSizes (1) ;
for v = 1: cols
DO{1}( v ) = DO{1}( v ) − x i {1}( s , v ) / Gain {1}( v ) ;
%DG{1}( v ) = DG{1}( v ) − x i {1}( s , v ) ∗( y {1}( s , v ) − O f f s e t {1}( v ) )
/ ( Gain {1}( v ) ˆ 2 ) ;
108 end % v l o o p
end % s l o o p
% f i n d t h e norm o f t h e g r a d i e n t
squarelength = 0 . 0 ;
f o r p = 1 : P−1
113 s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DT{p } , ) ) ˆ2;
i f DOGAIN == 1
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DG{p } , ) ) ˆ2;
end
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DO{p } , ) ) ˆ2;
118 end
i f DOGAIN == 1
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DG{P} , ) ) ˆ2 + ( norm (DO{P} ,
) ) ˆ2;
else
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DO{P} , ) ) ˆ2;
123 end
length = sqrt ( squarelength ) ;
Normgrad = l e n g t h ;
NormGradMFFN = Normgrad
lengthtol = 0.05;
128 i f length < lengthtol
s c a l e = lambda ;
else
s c a l e = lambda / l e n g t h ;
end
133 fscale = scale ;
% now do t r a i n i n g
f o r p = 1 : P−1
rows = L a y e r S i z e s ( p ) ;
c o l s = L a y e r S i z e s ( p+1) ;
308 16 Matrix Feed Forward Networks

138 f o r u = 1 : rows
for v = 1: cols
T{p } ( u , v ) = T{p } ( u , v ) − s c a l e ∗DT{p } ( u , v ) ;
end% v l o o p
end% u l o o p
143 f o r u = 1 : rows
O f f s e t {p } ( u ) = O f f s e t {p } ( u ) − s c a l e ∗DO{p } ( u ) ;
%Gain{p }( u ) = Gain{p }( u ) − s c a l e ∗DG{p }( u ) ;
end%u l o o p
end% p l o o p
148 rows = L a y e r S i z e s (P) ;
f o r u = 1 : rows
O f f s e t {P} ( u ) = O f f s e t {P} ( u ) − s c a l e ∗DO{P} ( u ) ;
%Gain{P}( u ) = Gain{P}( u ) − s c a l e ∗DG{P}( u ) ;
end
153
end

16.7.4 Training

Listing 16.11: Training: mffntrain.m


1 f u n c t i o n [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters )
%
% We t r a i n a P l a y e r MFFN t o math i n p u t t o t a r g e t s w h e r e P i s t h e
number
% of layers
% number o f n o d e s i n t h e i n p u t l a y e r 0 = L a y e r S i z e s ( 1 )
6 % number o f n o d e s i n t h e m i d d l e l a y e r 1 = L a y e r S i z e s ( 2 )
% ...
% number o f n o d e s i n t h e o u t p u t l a y e r = L a y e r S i z e s (P)
%
% I n p u t i s i n p u t m a t r i x o f S rows o f i n p u t d a t a
11 % S x LayerSizes (1)
% t a r g e t i s d e s i r e d t a r g e t v a l u e m a t r i x o f S rows o f t a r g e t d a t a
% S x L a y e r S i z e s (P)
%
% Gain i s t h e MFFN g a i n p a r a m e t e r s
16 % O f f s e t i s t h e MFFN o f f s e t p a r a m e t e r s
% T i s t h e MFFN e d g e c o e f f i c i e n t s
% y i s t h e node i n p u t d a t a
% Y i s t h e node o u t p u t d a t a
%
21 Energy = [ ] ;
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
Energy = [ Energy E ] ;
f o r t = 1 : NumIters
[ G, O, T ] = m f f n u p d a t e ( Input , Target , n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e , G,
O, T , L a y e r S i z e s , y , Y, lambda ) ;
26 [ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
Energy = [ Energy E ] ;
end
p l o t ( Energy ) ;
end
16.8 Sample Training Sessions 309

16.8 Sample Training Sessions

16.8.1 Approximating a Step Function

Now we can test the code. We use 31 exemplars to train a 1-5-1 MFFN with the gain
updates turned off to approximate a simple step function. So we have 10 + 7 = 17
tunable parameters and 31 constraints. We have to setup inputs and outputs. Here we
want every input less than 1 to be the target 0 and every input bigger than 1 to be the
target 1. The traditional if conditional in Matlab is not suited for vectors such as we
create below for X. Hence we use the function iff below to do the testing. The builtin
Matlab function nargchk is used with condition set to be the test x < 1 and the
code line iff(x<1,1,0) is expanded into error(nargchk(x<1,1,0,31))
as the size of X is 31. When the test is true, the return value is 1 and otherwise it is
0. This enables us to use conditional testing on a vector.

Listing 16.12: A vector conditional test function


function r e s u l t = i f f ( condition , trueResult , f a l s e R e s u l t )
e r r o r ( nargchk ( 3 , 3 , nargin ) ) ;
i f condition
result = trueResult ;
5 else
result = falseResult ;
end
end

Listing 16.13: Testing the Code


X = linspace (0 ,3 ,31) ;
2 Input = X’ ;
U = a r r a y f u n (@( x ) i f f ( x < 1 , 1 , 0 ) ,X) ;
Target = U’ ;
LayerSizes = [ 1 ; 5 ; 1 ] ;
OL = − 1 . 4 ;
7 OH = 1 . 4 ;
GL = 0 . 8 ;
GH = 1 . 2 ;
TL = − 2 . 1 ;
TH = 2 . 1 ;
12 [ G, O, T ] = m f f n i n i t (GL,GH, OL ,OH, TL , TH, L a y e r S i z e s ) ;
nodefunction = {};
nodefunctionprime = {};
sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
n o d e f u n c t i o n {1} = s i g m a ;
17 s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
SL = − . 5 ;
SH = 1 . 5 ;
% o u p u t n o d e s r a n g e from SL t o SH
22 s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;
% t h e s e d e r i v a t i v e s are r e a l l y p a r t i a l sigma / p a r t i a l y
% s o a l l h a v e a 1/ g a i n b u i l t i n
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
310 16 Matrix Feed Forward Networks

27 n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t
) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
32 %
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
NumIters = 1 0 ;
lambda = . 0 0 5 ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
37 G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
NumIters = 1 0 ;
lambda = . 0 0 5 ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
42 ...
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, lambda , 2 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 0 ) ;
47 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, 0 . 0 0 6 , 1 0 0 0 ) ;

After all this training, we still haven’t quite got it. What we do have is an approxi-
mation to this step function shown in Fig. 16.1. To generate this graph, we evaluate
the FFN at new points and plot along with the graph of the real function.

Fig. 16.1 Approximating a step function using a 1-5-1 MFFN


16.8 Sample Training Sessions 311

Fig. 16.2 Approximating a sin2 using a 1-5-1 MFFN

Listing 16.14: Generating the plot that test our approximation


X = linspace (0 ,3 ,21) ;
2 Input = X’ ;
U = a r r a y f u n (@( x ) i f f ( x < 1 , 1 , 0 ) ,X) ;
Target = U’ ;
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
Z = Y{ 3 } ;
7 p l o t (X, U, X, Z ) ;

16.8.2 Approximating sin2

We are going to approximate sin2 (x) on the interval [0, π]. We use 21 examplars
with a 1-5-1 MFFN which is 17 parameters that can be adjusted and 21 constraints.

Listing 16.15: Approximating sin2


X = linspace (0 ,3 ,21) ;
Input = X’ ;
3 U = s i n (X) . ˆ 2 ;
Target = U’ ;
LayerSizes = [ 1 ; 5 ; 1 ] ;
OL = − 1 . 4 ;
OH = 1 . 4 ;
8 GL = 0 . 8 ;
GH = 1 . 2 ;
TL = − 2 . 1 ;
TH = 2 . 1 ;
[ G, O, T ] = m f f n i n i t (GL,GH, OL ,OH, TL , TH, L a y e r S i z e s ) ;
312 16 Matrix Feed Forward Networks

13 nodefunction = {};
nodefunctionprime = {};
sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
n o d e f u n c t i o n {1} = s i g m a ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
18 n o d e f u n c t i o n {2} = s i g m a i n p u t ;
SL = − . 5 ;
SH = 1 . 5 ;
% o u p u t n o d e s r a n g e from SL t o SH
s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
23 n o d e f u n c t i o n {3} = s i g m a o u t p u t ;

s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
28 n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t
) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
%
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
33 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, 0 . 0 0 5 , 1 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
...
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 5 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 5 0 ) ;
38 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 5 0 ) ;

X = linspace (0 ,3.14 ,101) ;


Input = X’ ;
U = s i n (X) . ˆ 2 ;
43 Target = U’ ;
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
Z = Y{ 3 } ;
p l o t (X, U, X, Z ) ;

After all this training, we are doing pretty well. The approximation is shown in
Fig. 16.2 and it is clearly phase shifted a bit.

16.8.3 Approximating sin2 Again: Linear Outputs

The choice of linear output node functions does not do as well. Even after significant
training, the approximation is not as good. The approximation is shown in Fig. 16.3.
Listing 16.16: Approximating sin2 Again: Linear Outputs
X = linspace (0 ,3.14 ,31) ;
Input = X’ ;
U = s i n (X) . ˆ 2 ;
4 Target = U’ ;
LayerSizes = [ 1 ; 4 ; 1 ] ;
OL = − 1 . 4 ;
OH = 1 . 4 ;
GL = 0 . 8 ;
9 GH = 1 . 2 ;
TL = − 2 . 1 ;
TH = 2 . 1 ;
[ G, O, T ] = m f f n i n i t (GL,GH, OL ,OH, TL , TH, L a y e r S i z e s ) ;
nodefunction = {};
16.8 Sample Training Sessions 313

Fig. 16.3 Approximating a sin2 using a 1-5-1 MFFN with linear outputs

14 nodefunctionprime = {};
sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
n o d e f u n c t i o n {1} = s i g m a ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
19 s i g m a o u t p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;

s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
24 s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
%
29 [ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
NumIters = 1 0 ;
lambda = . 0 0 0 5 ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 ) ;
34 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 4 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 4 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 2 0 0 0 ) ;
39 .
more i t e r a t i o n s
.
X = linspace (0 ,3.14 ,101) ;
Input = X’ ;
314 16 Matrix Feed Forward Networks

44 U = s i n (X) . ˆ 2 ;
Target = U’ ;
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
Z = Y{ 3 } ;
p l o t (X, U, X, Z ) ;

Reference

D. Rumelhart, J. McClelland, Parallel Distributed Processing: Explorations in the Microstructure


of Cognition (MIT Press, Cambridge, 1986)
Chapter 17
Chained Feed Forward Architectures

We now introduce a version of the feed forward architecture known as the chained
feedforward network or CFFN and discuss the backpropagation method of training in
this context. Here, we will implement the FFN as a chain of computational elements
which will provide a very general and flexible backbone for future discussion and
expansion. The CFFN is quite well-known and our first exposure to it occurred in
the classic papers of Werbos (1987a, b, 1988, 1990a, b). We will apply the methods
derived in Sect. 17.3 below to calculate the needed partial derivatives.

17.1 Introduction

We consider a function H : n I → n O that has a very special nonlinear struc-


ture. This structure consists of a string or chain of computational elements, gener-
ally referred to as neurons in deference to a somewhat tenuous link to a lumped
sum model of post-synaptic potential. Each neuron processes a summed collection
of weighted inputs via a saturating transfer function with bounded output range.
The neurons whose outputs connect to a given target or postsynaptic neuron are
called presynaptic neurons. Each presynaptic neuron has an output Y which is
modified by the synaptic weight T pr e, post connecting the presynaptic neuron to the
postsynaptic neuron. This gives a contribution T pr e, post Y to the input of the post-
synaptic neuron. A typical saturating transfer model was shown in the Fig. 14.11a
in Chap. 14. Figure 14.11a shows a postsynaptic neuron with four weighted inputs
which are summed and fed into the transfer function which then processes the input
into a bounded scalar output. Figure 14.11b in Chap. 14 illustrates more details a
typical sigmoid function processing node. As discussed in Chap. 14, a typical
 trans-
fer function could be modeled as σ(x, o, g) = 0.5 1.0 + tanh x−o
φ(g)
with the

 
φ (g)
usual transfer function derivative given by ∂σ
∂x
(x, o, g) = 2φ(g) x−o
sech 2 φ(g) where o
denotes the offset indicated in the drawing and φ(g) is a function controlling slope

© Springer Science+Business Media Singapore 2016 315


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_17
316 17 Chained Feed Forward Architectures

which is usually called the gain of the transfer function. The function φ(g) is for con-
venience only; it is awkward to allow the denominator of the transfer function model
to be zero and to change sign. A typical function we  use to control the range of the
gain parameter is φ(g) = gm + g M −g
2
m
1.0 + tanh(g) where gm and g M denote the
lower and upper saturation values of the gain parameter’s range. The CFFN model
consists of a string of N neurons, labeled from 0 to N − 1. Some of these neurons
can accept external input and some have their outputs compared to external targets.
We let

U = {i ∈ {0, . . . , N − 1} | neuron i is an input neuron}


 
= u 0 , . . . , u n I −1 (17.1)
V = {i ∈ {0, . . . , N − 1} | neuron i is an output neuron}
= {v0 , . . . , vn O −1 } (17.2)

We will let n I and n O denote the cardinality of U and V respectively. The remaining
neurons in the chain which have no external role will be called hidden neurons with
dimension n H . Note that n H + | U | = N . Note that it is possible for an input neuron
to be an output neuron; hence U and V need not be disjoint sets. The chain is thus
divided by function into three possibly overlapping types of processing elements: n I
input neurons, n O output neurons and n H internal or hidden neurons. In Fig. 14.12a,
we see a prototypical chain of eleven neurons. For clarity only a few synaptic links
from pre to post neurons are shown. We see three input neurons (neurons 0, 1 and 4)
and four output neurons (neurons 3, 7, 9 and 10). Note input neuron 0 feeds its
output forward to input neuron 1 in addition to feeding forward to other postsynaptic
neurons. The set of postsysnaptic neurons for neuron 0 can be denoted by the symbol
F(0) which here is the set

F(0) = {1, 2, 4}

Similarly, we see

F(4) = {5, 6, 8, 9}

We will let the set of postsynaptic neurons for neuron i be denoted by F(i), the
set of forward links for neuron i. Note also that each neuron can be viewed as a
postsynaptic neuron with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by
B(i). In our example,

B(0) = {}
B(4) = {0}

where in general, the backward link sets will be much richer in connections than these
simple examples indicate. The weight of the synaptic link connecting the presynaptic
17.1 Introduction 317

Table 17.1 FFN evaluation


for(i = 0; i < N ; i + +) {
process
if (i ∈ U)
y i = xi + j∈B(i) Tj→i Y j
else
y i = j∈B(i) Tj→i Y j
Y i = σ i (y i , oi , g i )

neuron i to the postsynaptic neuron j (it is assumed j > i) will be denoted by Ti→ j .
The input of a typical postsynaptic neuron therefore requires summing over the
backward link set of the postsynaptic neuron in the following way:

y post = x + T pr e→ post Y pr e
pr e∈B( post)

where the term x is the external input term which is only used if the post neuron is
an input neuron. This is illustrated in Fig. 14.13. We will use the following notation
(some of which has been previously defined) to describe the various elements of the
chained FFN:

xi : The external input to the ith input neuron


yi : The summed input to the ith neuron
oi : The offset of the ith neuron
gi : The gain of the ith neuron
σi : The transfer function of the ith neuron
Yi: The output of the ith neuron
Ti→ j : The synaptic efficacy of the link between neuron i and neuron j. This link is
only defined for i < j. However, for convenience, we can set up a connection
between any neuron i and neuron j and simply set Ti→ j = 0 for both
infeasible connections (such as j ≤ i) and connections not realized in the
architecture.
F(i): The forward link set for the ith neuron
B(i): The backward link set for the ith neuron

The chain FFN processes an arbitrary x ∈ R n I via an iterative process as shown in


Table 17.1.
The output of the CFFN is therefore a vector in n O defined by
 
H (x) = Y i | i ∈ V (17.3)

that is,
⎡ ⎤
Y v0
⎢ .. ⎥
H (x) = ⎣ . ⎦
Y vn O −1
318 17 Chained Feed Forward Architectures

and we see that H : n I → n O is a highly nonlinear function that is built out of


chains of feedforward nonlinearities. The parameters that control the value of H (x)
are the forward links, the offsets and the gains for each neuron. The cardinality of
these parameter sets are given by

N −1

ns = | F(i) |
i=0
no = N
ng = N

where | F(i) | denotes the size of the forward link set for the ith neuron; n s denotes
the number of synaptic links; n o , the number of offsets; and n g , the number of gains.
The number of parameters for a CFFN is then written as N p :

N p = ns + no + ng (17.4)

Now let I = {x α ∈ R n I : 0 ≤ α ≤ S − 1} and D = { Dα ∈ R n O : 0 ≤ α ≤ S − 1}


be two given sets of data of size S > 0. The set I is referred to as the set of input
exemplars and the set D is the set of outputs that are associated with exemplars.
Together, the sets I and D comprise what is known as the training set. Also, from
now on, we will use the subscript notation α to indicate the dependence of various
variables on the αth exemplar in the sets I and D.
Now, the indices which denote components of a given target Dα and the compo-
nents of the CFFN output H (I α ) are embarassingly mismatched. We will write the
target Dα as:
⎡ ⎤
Dαd(v0 )
⎢. ⎥
Dα = ⎣ .. ⎦
d(vn O −1 )

Now this notation is quite convoluted; however, we must be clear which component
of the target are to be matched with which CFFN output neuron. The mapping
d : V → {0, . . . , n O − 1} is, of course, the trivial correspondence d(vi ) = i. For
notational cleanness, we will simply denote d(vi ) by di . The CFFN training Problem
is then to choose the N p chain FFN parameters such that we minimize an energy
function, E, given by

 O −1
S−1 n
 
E = 0.5 f Yαvi − Dαi ,
α=0 i=0


S−1 
 
= 0.5 f Yαi − Dαdi (17.5)
α=0 i∈V
17.1 Introduction 319

where f is a nonnegative function of the error term Yαi − Dαdi ; e.g. using the function
f (x) = x 2 gives the standard L 2 or least squares energy function.

17.2 Minimizing the CFFN Energy

We will use the notation established in Sect. 17.1. If all the partial derivatives of
the energy function (17.5) with respect to the chain FFN parameters were known,
we could minimize (17.5) using a standard gradient descent scheme which we call
CFFN back propagation. Relabeling the N p CFFN variables temporarily as wi , 0 ≤
i ≤ N p − 1, the optimization problem we face is

min E (I, D, w)
(17.6)
w ∈ R Np

where we now explicitly indicate that the value of E depends on I, D and the
parameters w. In component form, the equations for gradient descent are then

∂ E 
winew = wiold − λ (17.7)
∂wi wold

where λ is a scalar parameter that determines what percentage of the gradient step
to actually take. We can write (17.7) more compactly using vector notation:

wnew = w old − λ∇ E(w old ) (17.8)

In practice, we typically replace ∇ E with the descent vector D obtained by normal-


izing ∇ E by its length:
⎧ ∇E
⎨ ∇ E  ∇ E  > 1
D=

∇E  ∇E ≤ 1

and use the optimization scheme:

wnew = w old − λD(wold )

17.3 Partial Derivative Calculation in Generalized Chains

In any sort of a model that is built from tunable parameter with an associated energy
function minimized by choosing the parameters via gradient descent, we must be
able to calculate the needed partial derivatives. We now discuss a recursive method
for the computation of the partial derivatives general computational chains.
320 17 Chained Feed Forward Architectures

The calculation of the partial derivatives required to implement gradient descent in


the context of backpropagation derivation is a bit intimidating and confusing because
of its recursive nature. To help explain this, let’s look at an abstract example: suppose
we have a function E which is a function of the three variables Y 0 , Y 1 and Y 2 . These
variables are assumed to have the following dependencies:

E = E(Y 0 , Y 1 , Y 2 )
Y 2 = f 2 (Y 0 , Y 1 )
Y 1 = f 1 (Y 0 )

where f i denotes the functional dependencies. Note that these computational ele-
ments are organized in a chain like fashion. Now, consider a particular example of
the chained functions Y 2 and Y 1 given below:

Y 2 = (3Y 0 + 4Y 0 Y 1 )2
Y 1 = 2(Y 0 )3

The function Y 2 depends on Y 0 in two different ways: the first is a direct dependence
which we can denote by the usual partial derivative symbols:

∂Y 2
= 2(3Y 0 + 4Y 0 Y 1 ) × (3 + 4Y 1 )
∂Y 0

However, there are additional hidden dependencies; e.g., Y 1 itself depends on Y 0 .


Using the usual partial derivative notation, the complete of total dependence of Y 2
on Y 0 is given by a much more complicated expression:

dY 2 ∂Y 2 ∂Y 2 ∂Y 1
= +
dY 0 ∂Y 0 ∂Y 1 ∂Y 0
2
where we denoted this total dependence using the notation dY dY 0
which is the total
derivative of Y 2 with respect to Y 0 . The situation can get more complicated. If
the function Y 1 had additional hidden dependencies on the variable Y 0 , the direct
∂Y 1
dependency term ∂Y 0 would be inadequate. In this case, it should be replaced by the
dY 1 2
total dependency term dY 0 . Thus, to compute the total derivative of Y with respect
0
to Y , we should use:

dY 2 ∂Y 2 ∂Y 2 dY 1
= +
dY 0 ∂Y 0 ∂Y 1 dY 0
= 2(3Y + 4Y 0 Y 1 ) × (3 + 4Y 1 ) + 2(3Y 0 + 4Y 0 Y 1 ) × (4Y 0 ) × 6(Y 0 )2
0
17.3 Partial Derivative Calculation in Generalized Chains 321

We therefore must make a distinction between the total derivative of a function with
respect to a variable u and the direct derivative of function with respect to u. We
will denote total derivatives with the d and direct derivatives with the @ notation,
respectively. Of course, in some cases these derivatives are the same and so we can
use either notation interchangeably.
Let’s go back to the original example and add a function E which depends on Y 0 ,
Y and Y 2 . We let
1

E = (Y 0 )2 + (Y 1 )4 + (Y 2 )8
Y 2 = (3Y 0 + 4Y 0 Y 1 )2
Y 1 = 2(Y 0 )3

Note that the direct dependence of E on Y 0 is:

∂E
= 2Y 0
∂Y 0

However, the total dependence of E on Y 0 must be calculated as follows:

dE ∂E ∂ E dY 1 ∂ E dY 2
= + +
dY 0 ∂Y 0 ∂Y 1 dY 0 ∂Y 2 dY 0
 
∂E ∂ E ∂Y 1
∂ E ∂Y 2 ∂Y 2 ∂Y 1
= + + +
∂Y 0 ∂Y 1 ∂Y 0 ∂Y 2 ∂Y 0 ∂Y 1 ∂Y 0
= 2Y 0 + 4(Y 1 )3 × 6(Y 0 )2
  
+ 8(Y 2 )7 × 2(3Y 0 + 4Y 0 Y 1 ) × (3 + 4Y 1 ) + (4Y 0 ) × 6(Y 0 )2 (17.9)

This certainly looks complicated! However, the chained nature of these computations
allows us to reorganize this is much easier format: consider the recursive scheme
below:
dE ∂E
= (17.10)
dY 2 ∂Y 2
dE ∂E d E ∂Y 2
= + (17.11)
dY 1 ∂Y 1 dY 2 ∂Y 1
dE ∂E d E ∂Y 1 d E ∂Y 2
= + + (17.12)
dY 0 ∂Y 0 dY ∂Y
1 0 dY 2 ∂Y 0
After calculation, this simplifies to:

dE
= 8(Y 2 )7
dY 2
dE dE
= 4(Y 1 )3 + × 2(3Y 0 + 4Y 0 Y 1 ) × (4Y 0 )
dY 1 dY 2
322 17 Chained Feed Forward Architectures

= 4(Y 1 )3 + (8(Y 2 )7 ) × 2(3Y 0 + 4Y 0 Y 1 ) × (4Y 0 )


dE dE dE
= 2Y 0 + × 6(Y 0 )2 + × 2(3Y 0 + 4Y 0 Y 1 ) × (3 + 4Y 1 )
dY 0 dY 1 dY 2

= 2Y 0 + 4(Y 1 )3 + (8(Y 2 )7 ) × 2(3Y 0 + 4Y 0 Y 1 ) × (4Y 0 ) × 6(Y 0 )2
+ 8(Y 2 )7 × 2(3Y 0 + 4Y 0 Y 1 ) × (3 + 4Y 1 )
= 2Y 0 + 4(Y 1 )3 × 6(Y 0 )2
  
+ 8(Y 2 )7 × 2(3Y 0 + 4Y 0 Y 1 ) × (3 + 4Y 1 ) + 6(Y 0 )2 × (4Y 0 )

The recursive calculation matches the one that was obtained directly via (17.9). The
discussion above motivates a general recursive scheme for computing the rates of
change of a given function E with respect to chained variables. Assume the function
E depends on the variables Y 0 to Y N in the following way:

E = E(Y 0 , Y 1 , . . . .Y N )
Y N = f N (Y 0 , Y 1 , . . . , Y N −1 )
Y N −1 = f N −1 (Y 0 , Y 1 , . . . , Y N −2 )
..
.
Y = f i (Y 0 , Y 1 , . . . , Y i−1 )
i

..
.
Y 1 = f 1 (Y 0 )

We compute the required partial derivatives recursively as follows:

dE ∂E
= (17.13)
dY N ∂Y N
dE ∂E d E ∂Y N
= + (17.14)
dY N −1 ∂Y N −1 dY N ∂Y N −1
dE ∂E d E ∂Y N d E ∂Y N −1
= + + . (17.15)
dY N −2 ∂Y N −2 dY N ∂Y N −2 dY N −1 ∂Y N −2

dE ∂E N
d E ∂Y j
= + . (17.16)
dY i ∂Y i j=i+1
dY j ∂Y i

dE ∂E  d E ∂Y j
N
= + (17.17)
dY 0 ∂Y 0 j=1
dY j ∂Y 0

Note that this can easily be organized into the scheme given in Table 17.2. Also,
we use the notation ∂Y for the more correct form ∂∂Yf i . Now, let’s consider another
j j

∂Y i
complication: each function Y i has an internal structure of functional parameters
17.3 Partial Derivative Calculation in Generalized Chains 323

Table 17.2 Recursive


chained partial calculation for(i = N ; i ≥ 0; i − −) {
if (i == N )
dE ∂E
dY i
= ∂Y i
else
dE ∂E N dE ∂Y j
dY i
= ∂Y i + j=i+1 dY j ∂Y i

wi = {wi0 , . . . , wi,ni −1 } which are used in addition to the inputs from elements Y 0 to
Y i−1 . The symbol n i denotes the number of internal parameters in function Y i . This
internal structure is not based on a chained architecture. To be explicit, our previous
example can be written in this new context as follows:

Y 0 = f 0 (w00 , . . . , w0,n 0 −1 )
= f 0 (w0 )
Y 1 = f 1 (Y 0 , w10 , . . . , w1,n 1 −1 )
= f 1 (Y 0 , w1 )
Y 2 = f 2 (Y 0 , Y 1 , w20 , . . . , w2,n 2 −1 )
= f 2 (Y 0 , Y 1 , w2 )
E = E(Y 0 , Y 1 , Y 2 )
∂E
To calculate ∂wi j
, we can modify the previous recursive scheme for all appropriate
indices i:

dE ∂E
=
dY 2 ∂Y 2
dE d E ∂Y 2
=
dw2i dY 2 ∂w2i
dE ∂E d E ∂Y 2
= +
dY 1 ∂Y 1 dY 2 ∂Y 1
dE d E ∂Y 1
=
dw1i dY 1 ∂w1i
dE ∂E d E ∂Y 1 d E ∂Y 2
= + +
dY 0 ∂Y 0 dY 1 ∂Y 0 dY 2 ∂Y 0
dE d E ∂Y 0
=
dw2i dY 0 ∂w0i

We summarize the recursive partial calculations when internal parameters are present
in Table 17.3. We can use these recursive techniques to derive the backpropagation
algorithm for the case of the Chain Feed Forward Network, CFFN. Indeed, we can
also derive the backpropagation algorithm for many other architectures in a similar
way.
324 17 Chained Feed Forward Architectures

Table 17.3 Recursive


chained partial calculation for(i = N ; i ≥ 0; i − −) {
if (i == N )
with internal parameters dE ∂E
dY i
= ∂Y i
else
dE ∂E N dE ∂Y j
dY i
= ∂Y i + j=i+1 dY j ∂Y i
for(j = 0; j ≤ ni − 1; j + +)
dE dE ∂Y i
dwij = dY i ∂wij

17.4 Partial Calculations for the CFFN

We now derive the fundamental equations which allow us to recursively compute


the needed partial derivatives starting at the last output neuron and sweeping right
to left toward the first input neuron. This recursive computational scheme is known
as backpropagation and in this form it is inherently serial in nature. We will use the
discussion from Sect. 17.3 and the information in Tables 17.2 and 17.3 heavily in
what follows.
We will begin by laying out some notational ground rules:

Neuron Transfer Functions: These will be modeled somewhat generally. Each


neuron i has an associated transfer function σ i which we will model as a func-
tion of three arguments; i.e. σ i (y i , oi , g i ) ≡ σ i using our previously established
notations for the input, offset and gain of neuron i. There are then three partial
derivatives of interest:
∂σ i
∂y
: This is the rate of change of the transfer function with respect to its
first argument. At sample α, we will denote this derivative by σ0,α i

σ0 (yα , o , g ).
i i i i
∂σ i
∂o
: This is the rate of change of the transfer function with respect to its sec-
ond argument. At sample α, we will denote this derivative by σ1,α i

σ1 (yα , o , g ).
i i i i
∂σ i
∂g
: This is the rate of change of the transfer function with respect to its
third argument. At sample α, we will denote this derivative by σ2,α i

σ2 (yα , o , g ).
i i i i

Now, the output of a given transfer function is denoted by

Y i = σ i (y i , oi , g i )
⎛ ⎞

= σi ⎝ T j→i Y j , oi , g i ⎠
j∈B(i)

≡ f (Y 0 , Y 1 , . . . , Y i−1 , T0→i , . . . , Ti−1→i , oi , g i )


i
17.4 Partial Calculations for the CFFN 325

where the internal parameters of the function Y i are the gain and offset of the
transfer function as well as the link weights connecting to neuron i. Hence, this
particular type of chained architecture can be written in functional form using the
internal parameters
wi = {T0→i , . . . , Ti−1→i , oi , g i }

This internal parameter set has cardinality i + 2. Thus, each transfer function has
the instantiated form
Y i = σ i (y i , oi , g i )
≡ f i (Y 0 , Y 1 , . . . , Y i−1 , wi )

Error Notation: The error between the value from output neuron i due to sample
α and the desired output component Dαdi will be denoted by the term eαi . For
notational purposes, we will define the generalized output error as follows:

Yαi − Dαdi , i ∈ V
eαi =
0, else

Energy Function: The energy function depends explicitly on the outputs of the n O
output neurons in the chain FFN. The partial with respect to the ith output neuron
∂E
will be denoted E i = ∂Y i whose value at a given sample α is given by


Yαi − Dαdi = eαi i ∈ V
E i,α =
0, else

∂E
Note that E i,α is equivalent to the direct partial term ∂Y i
.
Indicial Notation: Our indices derive from a C based notation. Hence, sums typi-
cally run from 0 to an upper bound of the form M − 1. To avoid abusive notation,
we will use the convention M  ≡ M − 1 to indicate that the variable M has been
decremented by 1. We will also let θ denote n O . Note that we also have a nice
identity connecting the forward and backward link sets: for a given node i, we
have
N −1 
 
= (17.18)
k=i+1 i∈B(k) k∈F (i)

This formula is interpreted as follows: since the value i is fixed, the inner sum over i ∈
B(k) is empty unless i is in the set B(k). We will now apply the notions about recur-
sively organized chained partial calculations to the CFFN. Since we have seen how to
interpret the CFFN as a general chained architecture using the notation of Sect. 17.3,
we can apply the recursive partial calculations for chains with internal parameters
directly to obtain (see Table 17.3) the backpropagation equations for the CFFN.
326 17 Chained Feed Forward Architectures

∂Y j
17.4.1 The ∂Y i
Calculation

∂Y j
We begin by finding the critical partial ∂Y i
so that we can find the total dependence
of E on Y i .
⎧ ⎫
∂Y j
∂ ⎨  ⎬
j
= σ (y j
, o j
, g j
) x j
+ Tk→ j Y k
∂Y i 0
∂Y i ⎩ k∈B( j)

j

= σ0 (y j , o j , g j ) Tk→ j δik
k∈B( j)
j

= σ0 (y j , o j , g j ) Ti→ j
i∈B( j)

where we interpret x j as zero in j is not in the input set U. Hence, the general
expansion term

N −1

dE ∂E d E ∂Y j
= +
dY i ∂Y i
j=i+1
dY j ∂Y i

in this context becomes


N −1 

dE ∂E dE j j i j
= + σ (y , o , g ) Ti→ j
dY i ∂Y i j=i+1 i∈B( j)
dY j 0

∂E  dE j
= + σ (y j , o j , g j ) Ti→ j
∂Y i j∈F (i)
dY j 0

where we use the identity (17.18) to simplify the equations above. Of course, the
dE i
term dY i is not there unless the energy has a direct dependence on Y . This only

happens if the node i is an output node.

17.4.2 The Internal Parameter Partial Calculations

The partials with respect to the internal parameters consist of two separate types:
the first are partials with respect to the offset and gain variables and the second
are partials with respect to linking terms. The appropriate partial calculation from
Table 17.3 for a given internal parameter w is given by

dE d E ∂Y i
=
dw dY i ∂w
17.4 Partial Calculations for the CFFN 327

Offset and Gain Parameter Partials


For w = 0i and w = g i , we obtain

dE dE i dE dE i
= σ and i = σ
doi dY i 1 dg dY i 2

Linking Parameter Partials


For w = T j→i , we obtain
⎧ ⎫
dE dE i i i i ∂ ⎨  ⎬
= σ (y , o , g ) xi + Tk→i Y k
dT j→i dY i 0 ∂T j→i ⎩ k∈B(i)

⎧ ⎫
d E i i i i ⎨  k k⎬
= σ (y , o , g ) δjY
dY i 0 ⎩ ⎭
k∈B(i)

dE i i i i j
= σ (y , o , g )Y
dY i 0
These equations are summarized in the Table 17.4. For simplicity, in this table, we
assume there is only one output node—N − 1. Since we are really trying to compute
the gradient for the energy function E defined in terms of the full sample set, the
calculations outlined above must be done at each sample. This leads to the full
backpropagation equations given in Table 17.5. We observe that the backpropagation
algorithm to compute dYd Epr e at a given presynaptic neuron pr e requires summing over
β
the forward link set and a possible external error signal in the following way:
dE  post dE
pr e = e + σ0 T pr e→ post post
dYβ post∈F ( pr e)
dYβ

where all variables subscripted with post are associated with a postsynaptic neuron.
An illustration of this process is shown in Fig. 17.1. This is a very convenient way
to abstract this process. Combining our evaluation procedure (Table 17.1) and the

Table 17.4 Backpropagation


for(i = N − 1; i ≥ 0; i − −) {
in the CFFN: one sample
if (i == N − 1)
dE
dY i
= Ei
else
j j j j
dE
dY i
= Ei + j∈F (i) dY
dE
j σ0 (y , o , g ) Ti→j

for(j ∈ F(i))
dE j i
∂Ti→j = dY j σ0 Y
∂E

∂E
∂oi
= dE i
σ
dY i 1
∂E
∂g i
= dE i
σ
dY i 2
328 17 Chained Feed Forward Architectures

Table 17.5 Backpropagation


for(β = 0; β < S; β + +) {
for CFFN: multiple samples
for(i = N − 1; i ≥ 0; i − −) {
j
dE
dY i
= Ei,β + j∈F (i) dEj σ0,β Ti→j
β dYβ
}
}
for(i = N − 1; i ≥ 0; i − −) {
for(j ∈ F(i))
S j
j σ0,β Yβ
∂E dE i
∂Ti→j = β=0 dYβ
S
∂E
∂oi
= β=0 σ0,β
i dE i
σ
dYβi 1,β
S
∂E
∂g i
= β=0 σ0,β
i dE i
σ
dY i 2,β
β

Fig. 17.1 Presynaptic error


calculation

backpropagation procedure (Table 17.5), we see that information about both back-
ward (evaluation) and forward (partial derivative calculation) links is required. Also,
note that as formulated here, this algorithm is inherently sequential! It is not easily
implemented in parallel!
We will discuss how to implement the chain training algorithms in MatLab soon.

17.5 Simple MatLab Implementations

Let’s look at how to setup code to implement the CFFN ideas. We won’t do a proper
job yet as there are lots of details that are difficult to code properly. But we will start
the process now by looking at how to code the evaluation portion of the of a 2-3-1
CFFN which is the same network as our usual 2-3-1 MFFN we have discussed earlier.
We begin with the usual initializations we put into the function chaininit. This
function sets up a vector of six node numbers which we only need to know how many
nodes there are. You’ll note we don’t really use the individual entries on the variable
N here. We store the in node of an edge and the out node as well as the weight
value of the edge w in a data structure MatLab uses called a struct. Our struct is
called e and we can access the three fields in e using the notation e.in, e,out
and e.w. We set the edges for the 2-3-1 network architecture in the cell variable a
as a collection of row vectors of the form [in, out]. We use e to construct a cell
E to hold the edge information we need.
17.5 Simple MatLab Implementations 329

Now if you think about the evaluation step in a CFFN, recall we will have to
compute j∈B(i T j→i Y j . The backward set B(i) consists of a set of nodes. The
weight values T j→i are stored in the cell E. Suppose we had a CFFN with 100 nodes
and we were looking at the backward set B(36). The edges in the cell E are labeled
starting at 1 and with the last edge in the CFFN. Each node listed in B(36) corresponds
to an index j in a T j→i value we need. However, the value T j→i is stored as a value
w in some edge E{k}. We need to find the indices k where E{k}.in = j which
implies a search over the edges for each index j from the backward set. This is
expensive! It is a lot better to save us the search by storing edges corresponding to
the backward sets in a new cell BE. Exactly what is stored in each BE{k} depends
on how we choose to number our edges so be careful to keep track of your edge
numbering scheme as you will need to remember it here.
After we set up BE, we initialize the node functions as usual.

Listing 17.1: Setting up a 2-3-1 CFFN


f u n c t i o n [ N, O, G, E , BE, n o d e f u n c t i o n ] = c h a i n i n i t ( )
%
N = [1;2;3;4;5;6];
e = struct () ;
5 E = {};
a = {[1 ,3];[1 ,4];[1 ,5];[2 ,3];[2 ,4];[2 ,5];[3 ,6];[4 ,6];[5 ,6]};
W = −1+2∗rand ( l e n g t h ( a ) , 1 ) ;
O = −1+2∗rand ( l e n g t h ( a ) , 1 ) ;
G = 0 . 6 + 0 . 6 ∗ rand ( l e n g t h ( a ) , 1 ) ;
10 f o r i = 1: length ( a )
e . i n = a{ i } ( 1 ) ;
e . o u t = a{ i } ( 2 ) ;
e . w = W( i ) ;
E{ i } = e ;
15 end

BE = { } ;
BE{1} = { } ;
BE{2} = { } ;
20 BE{3} = [ E{ 1 } ;E { 4 } ] ;
BE{4} = [ E{ 2 } ;E { 5 } ] ;
BE{5} = [ E{ 3 } ;E { 6 } ] ;
BE{6} = [ E{ 7 } ;E{ 8 } ;E { 9 } ] ;

25 s i g m a = @( x , o , g ) 0 . 5∗ ( 1 + tanh ( ( x −0) / g ) ) ;
s i g m a i n p u t = @( x , o , g ) ( x−o ) / g ;
n o d e f u n c t i o n {1} = s i g m a i n p u t ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
n o d e f u n c t i o n {3} = s i gm a ;
30 n o d e f u n c t i o n {4} = s i gm a ;
n o d e f u n c t i o n {5} = s i gm a ;
n o d e f u n c t i o n {6} = s i gm a ;

end

Next, we tackle the evaluation part with the function chaineval. Note we handle
the input nodes separately and the other nodes that implement the j∈B(i) T j→i Y j
calculation use the BE cell data. For node i, BE{i}(j) gives us an edge connecting
to i. Hence, we can extract the needed weight value easily, without a search, using
330 17 Chained Feed Forward Architectures

BE{i}(j).w. Note we never use the backward sets B here as the backward edge
sets are more useful.

Listing 17.2: The evaluation loop for a 2-3-1 CFFN


f u n c t i o n [ y ,Y] = c h a i n e v a l (X, N, O, G, BE, n o d e f u n c t i o n )
%
%
f o r i = 1 : l e n g t h (N)
5 i f ( i == 1 | | i == 2 )
y ( i ) = X( i ) ; Y( i ) = n o d e f u n c t i o n { i } ( y ( i ) ,O( i ) ,G( i ) ) ;
else
y( i ) = 0;
[ row , c o l ] = s i z e (BE{ i } ) ;
10 f o r j = 1 : row
w e i g h t = BE{ i } ( j ) . w ;
y ( i ) = y ( i ) + w e i g h t ∗Y( j ) ;
end
Y( i ) = n o d e f u n c t i o n { i } ( y ( i ) ,O( i ) ,G( i ) ) ;
15 end
end
end

To see this 2-3-1 CFFN in action, we make up some data and push it through the
architecture. We find

Listing 17.3: Testing our 2-3-1 CFFN


X = [1.2;2.5];
[ N, O, G, E , BE, n o d e f u n c t i o n ] = c h a i n i n i t ( ) ;
3 [ y ,Y] = c h a i n e v a l (X, N, O, G, BE, n o d e f u n c t i o n ) ;
Y
Y =
0.2513 2.7088 0.9921 0.9509 0.0138 0.8594

References

P. Werbos, Learning how the world works, in Proceedings of the 1987 IEEE International Confer-
ence on Systems, Man and Cybernetics, vol. 1 (IEEE, 1987a), pp. 302–310
P. Werbos, Building and understanding adaptive systems: A statistical/numerical approach to factory
automation and brain research. IEEE Trans. Syst. Man Cybern. 17, 7–19 (1987b)
P. Werbos, Generalization of backpropagation with application to a recurrent gas market model.
Neural Netw. 1, 339–356 (1988)
P. Werbos, A menu of designs for reinforcement learning over time, in Neural Networks for Control,
eds. by W. Miller, R. Sutton, P. Werbos (1990a), pp. 67–96
P. Werbos, Backpropagation through time: What it does and how to do it. Proc. IEEE 78(10),
1550–1560 (1990b)
Part VI
Graph Based Modeling In Matlab
Chapter 18
Graph Models

Our cognitive model will consist of many artificial neurons interacting on multiple
time scales. Although it is possible to do detailed modeling of neurobiological sys-
tems using GENESIS (Bower and Beeman 1998) and NEURON (Hines 2003), we
do not believe that they are useful tools in modeling the kind of information flow
between cortical modules that is needed for a cognitive model. Progress in build-
ing large scale models that involve many cooperating neurons will certainly involve
making suitable abstractions in the information processing that we see in the neuron.
Neurons transduce and integrate information on the dendritic side into wave form
pulses and there are many models involving filtering and transforms which attempt
to “see” into the action potential and find its informational core so to speak. How-
ever, all of these methods are hugely computationally expensive and even a simple
cognitive model will require ensembles of neurons acting together locally to create
global effects. We believe the simple biological feature vector (BFV) discussed here
is able to discriminate subtle changes in the action potential wave form.Hence, we
will use ensembles of abstract neurons whose outputs are BFVs (11 dimensional say)
to create local cortical computationally phase locked groups. The effects of modu-
latory agents such as neurotransmitters can then be modeled as introducing changes
in the BFVs. The kinds of changes one should use for a given neurotransmitters
modulatory effect can be estimated from the biophysical and toxin literature. An
increase in sodium ion flow, Ca++ gated second messenger activity can be handled
at a high level as a suitable change in one of the 11 parameters of the BFV. Note this
indeed is an implication for realistic neuronal processing. Further, this raises a very
interesting question. How much information processing is possible in a large scale
interacting neuron population given an abstract form of neuronal output? How can
we get insight into the minimal amount of information carried per output wave form
that can subserve cognition?
We believe this type of interaction is inherently asynchronous and hence the
artificial neurons we use for our model must be treated as agents interacting asyn-
chronously. The discussion in the previous chapters gives us powerful clues for the
© Springer Science+Business Media Singapore 2016 333
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_18
334 18 Graph Models

design of a software architecture that may subserve computational intelligence pur-


poses. The components of this design are a finite collection of abstract neurons Ni . A
given neuron Ni0 accepts inputs from a sub population Nj j ∈ Di , where Di denotes
the input into the dendritic system of the artificial neuron. It then generates an output
we will denote by ζi which is of the form
 
ζi = t0i , V0i , t1i , V1i , t2i , V2i , t3i , V3i , g i , t4i , V4i .

This output takes the form of the low dimensional BFV we discussed earlier. Hence,
the input Di is a sequence
 
Di = ζj j ∈ Di .

where each ζj is a BFV structure. Event sequences in our neural model are thus
sequences of the 11 dimensional outputs [t0 , V0 , t1 , V1 , t2 , V2 , t3 , V3 , g, t4 , V4 ] that
are processed by a given neuron agent.
These neurons need to interact with each other and form temporally and spatially
localized ensembles of activity whose outputs are entrained. For a given compu-
tational intelligence model, there is a global system time clock as well. Thus, the
neurons function in a timed environment and so all variables in all inputs and outputs
are also tagged with time indices. The representation at this level is quite messy and
is usually avoided. We can always do computations at each agent simply in terms of
information that has arrived and is to be processed. However, to be precise the output
of a neuron would be
 
ζi (t) = t0it , V0it , t1it , V1it , t2it , V2it , t3it , V3it , g it , t4it , V4it .

The FFP and OCOS cortical circuits can be combined into a multi-column model
(Raizada and Grossberg 2003) as seen in Fig. 18.1. This model can be implemented
using the artificial neurons introduced above. The generic software architecture for
our model is shown in Fig. 18.2. In Fig. 18.1, each cortical stack consists of 3 cortical
columns. If each cortical layer consists of 9 neurons, this allows for 3 OCOS/FFP
pathways inside each column. This is roughly 27 neurons per column. In Fig. 18.3,
a more complete architecture is shown. Here, for convenience, each cortical module
consists of 4 cortical stacks, all of which are interconnected. Hence, the 4 column
structure illustrated in Fig. 18.3 can be implemented with approximately 108 neurons
each. Thus, the five cortical modules (Frontal, Parietal, Occipital, Temporal and
Limbic) illustrated require a total of 540 artificial neurons for their implementation.
In general, since we know we start with isocortex, we can assume the number of
stacks is the same in each module. Further, we can allow more OCOS/FFP circuits
per layer. Letting NOCOS/FFP denote the number of OCOS/FFP circuits per layer,
we need 3NOCOS/FFP neurons per stack. Thus, if we use NS cortical stacks for each
cortical module, we will use 3NOCOS/FFP NS neurons for each of the five cortical
modules (Frontal, Parietal, Occipital, Temporal and Limbic). This leads to a total of
15NOCOS/FFP NS for the simulation of the cortical model.
18 Graph Models 335

Fig. 18.1 The OCOS/FFP cortical model

Fig. 18.2 Basic cognitive software architecture design


336 18 Graph Models

Fig. 18.3 Cognitive model components

We also need a collection of monoamine producing and RF modulatory neurons.


We know the number of monoamine producing neurons is small relative to the total
brain population. Hence, for convenience, we will choose one neuron of each type
for each cortical module in the simulation. This gives 5 neurons of each monoamine
type to modulate each cortical model. We also require one RF neuron for each of the
5 cortical elements as well as an RF neuron to modulate each of the 3 monoamine
types. Thus a general simulation requires 15NOCOS/FFP NS neurons to implement the
cortex and 25 neurons to implement monoamine modulation of cortex and 5 neurons
to modulate the monoamine producing neurons via RF cells in the brain stem.
This model also does not use short or long term memory storage. These elements
can clearly be added, but studies of the small scale nervous systems of the spider
portia fimbriata (Harland and Jackson 2004), the honeybee (Zhang and Srinivasan
2004) and the praying mantis (Kral and Prete 2004) indicate that certain elements of
18 Graph Models 337

cognitive function have probably emerged in these animals despite the small number
of neurons in their brains. Estimates of the neuron population of portia fimbriata are
in the 400,000 range. Portia does not have much long term memory so our choice to
ignore long term memory needs seems reasonable for a first model of cognition. Since
our artificial neurons use specific data structures to store local agent information, we
do use short term memory which is probably similar to the use of short term memory
in the spider, honeybee and praying mantis for route planning and decision choices.
Using our simple model of monoamine and RF modulation, 400,000 neurons in
our model would imply 400,000/15 = 26,667 neurons for the cortex models. Thus
we have the constraint NOCOS/FFP NS = 26,667. If we use 200 OCOS/FFP circuits per
layer, this implies 133 cortical stacks per cortical module. Even a cursory examination
of the cortex of an animal shows that such choice will mimic real cortical structure
nicely. Hence, a model at this level of cortical complexity has a reasonable probability
of subserving cognitive function such as decision making. One way to implement this
software architecture is to use non-blocking system calls that handle all connections
between agents in one thread. A function is called whenever a connection between
agents is ready for processing. Hence, this type of programming is known as event-
drive or callback based in addition to being called asynchronous. Each of our artificial
neurons will function in a networked environment and must accept inputs from other
agents and generate outputs or decisions. A computational element can be forced to
wait and therefore be blocked if it is waiting on the result of data from other elements
which it must have to complete its own calculations. Since a neurons accepts inputs
from other neurons, it may sometimes perform a computation that takes enough
time to keep other neurons from processing. We will choose not to use this type of
architecture and instead implement as much code as possible in a language such as
Erlang in which all objects are immutable and hence the blocking of calls and the
sharing of memory is not an issue.
In order to build even simple brain models, we thus need to write code that
implements graphs that connect our processing nodes. We will start by building
simple graph tools in MatLab/Octave using object oriented techniques as it is a
programming language readily available and easy to learn. This will enable us to
construct the underlying architecture of connections for our model and then we can
do a proof of concept for the edge and nodal processing in MatLab as well. Eventually,
we will discuss how to do this processing in other languages but that will be done in
detail in later volumes. We will now start on the objects of real interest to us: graphs.
We wish to build neural models and to do that we need a way to implement directed
graphs where the nodes are neurons and the edges are the synaptic connections
between neurons.

18.1 Building Global Graph Objects One

The implementation of the class oriented code for the an object is done in a special
directory which must begin with the @ symbol. Following the Octave documentation,
we would set up a subdirectory called @CoolObject to use to define our coolobject
338 18 Graph Models

class. The code that makes an object for us in the CoolObject class is called a
constructor. It has this general look:

Listing 18.1: CoolObject


f u n c t i o n p = CoolObject ( a )
%
% default constructor
i f ( n a r g i n ==0)
5 p. c = [0];
p = c l a s s ( p , ’ Co o l O b j e c t ’ ) ;
e l s e i f ( n a r g i n ==1)
i f ( strcmp ( c l a s s ( a ) , ’ Co o l O b j e c t ’ ) )
p = a (:) ;
10 e l s e i f ( i s v e c t o r ( a ) && i s r e a l ( a ) )
p. c = a (:) ’;
p = c l a s s ( p , ’ Co o l O b j e c t ’ ) ;
else
e r r o r ( ’ Not a valid argument to build a CoolObject ’ ) ;
15 end
else
e r r o r ( ’ too many a r gu m e n t s to the c on s t r u c t o r . ’ )
end
end

The test nargin == 0 checks to see how many arguments are being used
to create our object. If there are no arguments, the line in Octave would be
p = CoolObject(); and we set the field element c of p to be a vector with
0 in it. Note that we are redefining how Octave interprets the statement p.c here
as normally the dot . has its own intrinsic meaning in Octave. In general, as usual
in object oriented programming, a class object, p can have multiple fields associ-
ated with it and we generally must overload the meaning of various built in Octave
commands to access these fields. We have already mentioned we must overload the
meaning of the dot "." and we generally also overload the meaning the evaluation
parenthesis so that the line p() is interpreted in a special way for our class object. If
there is one argument to our object, nargin == 1, we then use that argument to
fill in the field .cc. In this code, we also test to see that the argument we use is correct
for our object and leave an error if it is not. We also check to see that the number of
arguments does not exceed one. In this example, we assume we want our argument,
if present, to be either another CoolObject or a real valued vector. Note without
redefining the meaning of the dot . and the parentheses () for a CoolObject p,
the lines p.c and p(x) would not be allowed. To make these work we have to
overload the subsref function Octave uses to parse these statements. Once this is
done, we can then use the lines p.c and p(x) as we wish. To create a CoolObject
class, we set up a directory called @CoolObject. Note the name of the directory
and the name of class are almost the same. The directory however must start with an
@ symbol. Inside this directory, you find the code needed to build our objects.
18.1 Building Global Graph Objects One 339

Listing 18.2: Inside the CoolObject directory


[ p e t e r s j @ i n k a p o d ] $ l s @CoolObject
C o o l O b j e c t .m s u b s r e f .m

The code that builds a CoolObject object is called the constructor and is in the file
CoolObject.m. The name of this file, CoolObject.m without the extension .m
must match the name of the class subfolder @CoolObject without the @. Finally,
the overloaded subsref function is in subsref.m. Our code listed here is not
functional and instead is an outline of what we have to do.

Listing 18.3: Outline of the subsref function


f u n c t i o n out = s u b s r e f ( p , s )
%
%
i f ( isempty ( s ) )
5 e r r o r ( ’ CoolObject : missing index ’ ) ;
end
switch s . type
c a s e ’ () ’
x = s . subs {1};
10 % c a l l code to determine output of p ( )
out = 2 ;
c a s e ’. ’
f l d = s . subs ;
i f ( strcmp ( f l d , ’c ’ ) )
15 % access the f i e l d element c
out = p . c ;
else
e r r o r ( ’ invalid property for CoolObject obj ect ’ ) ;
end
20 otherwise
e r r o r ( ’ invalid subscript type for C o o l O b j e c t ’ ) ;
end
end

If we add the line s at the top of this function, we would see what the input s is for
the various cases. Using the dot, . gives us the following type and subs elements.

Listing 18.4: The subs elements of p.c


b = p. c
s =
t y p e = ’. ’
s u b s =’ c ’
5
a n s = w h a t e v e r t h e a c t i o n on c was

Hence the input s has type . and subs c. Then in the next one
340 18 Graph Models

Listing 18.5: The p() elements


z = p(x)
s =
t y p e = ’ () ’
s u b s = { [ w h a t e v e r x was ] }
5
ans = whatever ( ) did to x

For example, having added s at the top of subsref, we could run this example:

Listing 18.6: CoolObject Example


a = ’ jim is cool ’ ;
b = [1;2;3];
C = CoolObject ( a ) ;
C. c
5 s =
t y p e : ’. ’
s u b s : ’c ’
ans =
jim i s c o o l
10 D = CoolObject (b) ;
D. c
s =
t y p e : ’. ’
s u b s : ’c ’
15 ans =
1 2 3

This is probably not very clear at this point, so let’s build a graph class in MatLab
(or Octave!). A graph model or graph object will consist of edge and vertices objects
glued together in the right way, so we have to build three classes to make this happen.
We need to do this because in order to build neural models we need a way to imple-
ment directed graphs where the nodes are neurons and the edges are the synaptic
connections between neurons.
Make a directory GraphsGlobal and inside it make the usual class directories
@edges, @vertices and @graphs. We call this directory GraphsGlobal
because here all of our nodes and edges will be globally labeled. This means as
we glue subgraphs together, their original node and edge numbering will be lost as
the new graph is built. Now we will eventually want to keep track of the subgraph
information which we will do using an addressing scheme. However, that is another
story. For now, we will stick to global numbering. In the directory, @vertices, we
set up a class to handle vertices.

18.1.1 Vertices One

We now write the constructor code and the code for the overloaded indexing. The
constructor is as follows:
18.1 Building Global Graph Objects One 341

Listing 18.7: Vertices class


function V = vertices (a)
%
% c r e a t e n o d e s from t h e v e c t o r a
%
5 s = struct () ;
n = length ( a ) ;
f o r i =1:n
i f isvector (a)
s . node = a ( i ) ;
10 end
if iscell (a)
s . node = a{ i } ;
end
V. v ( i ) = s ;
15 end
V = c l a s s (V, ’ vertices ’ ) ;
end

We allow the vertices object to be constructed from a node list which is a vec-
tor (isvector(a)) or cell data (iscell(a)). The overloaded indexing is in
subsref.m.

Listing 18.8: The vertices subsref overloading


f u n c t i o n o u t = s u b s r e f (V, s )
%
% give access to private elements
% of p
5 %
switch s . type
case ”.”
f l d = s . subs ;
i f ( strcmp ( f l d , ” v ” ) )
10 n = l e n g t h (V. v ) ;
o u t = [V. v ( 1 ) . node ] ;
f o r i =2:n
o u t = [ o u t V. v ( i ) . node ] ;
end
15 else
e r r o r (” i n v a l i d property f o r v e r t i c e s o b j e c t ”) ;
end
otherwise
e r r o r (” @ v e r t i c e s / s u b s r e f : i n v a l i d s u b s c r i p t type f o r v e r t i c e s ”)
;
20 end
end

We can then have an session like these. First, use a column vector to construct the
vertices object.
342 18 Graph Models

Listing 18.9: Simple vertices session: column vector input


b = [1;2;3;4;5];
N2 = v e r t i c e s ( b ) ;
N2 . v
4 ans =

1 2 3 4 5

We can also use a row vector or cell data.

Listing 18.10: Simple vertices session: row vector or cell input


b = [1 ,2 ,3 ,4 ,5];
N2 = v e r t i c e s ( b ) ;
N2 . v
4 ans =
1 2 3 4 5
b = {1 ,2 ,3 ,4 ,5};
N2 = v e r t i c e s ( b ) ;
N2 . v
9 ans =
1 2 3 4 5

18.1.2 Edges One

In the subdirectory @edges, we then write the code both the constructor and for
overloading the subsref command so we can access field elements of an edge
object. Here is the edge constructor code. It takes a vector of ordered pairs and sets
up a collection of edge structures with fields in and out.

Listing 18.11: Edges


function E = edges ( a )
%
% c r e a t e e d g e s from t h e vector of pairs a
% o f t h e form [ i n ; o u t ]
5 %
s = struct () ;
n = length ( a ) ;
f o r i =1:n
s . i n = a{ i } ( 1 ) ;
10 s . o u t = a{ i } ( 2 ) ;
E. e ( i ) = s ;
end
E = c l a s s ( E , ’ edges ’ ) ;
end

and the overloaded indexing operators are in subsref.m.


18.1 Building Global Graph Objects One 343

Listing 18.12: The edge subsref overloading


f u n c t i o n out = s u b s r e f (E , s )
%
% give access to private elements
% of edges
5 %
switch s . type
case ”.”
f l d = s . subs ;
i f ( strcmp ( f l d , ” e ” ) )
10 n = l e n g t h (E . e ) ;
out = [ E . e ( 1 ) . i n ; E . e ( 1 ) . out ] ;
f o r i =2:n
out = [ out [ E . e ( i ) . i n ; E . e ( i ) . out ] ] ;
end
15 else
e r r o r (” i n v a l i d property f o r edges o b j e c t ”) ;
end
otherwise
e r r o r ( ” @edges / s u b s r e f : i n v a l i d s u b s c r i p t t y p e f o r edges ”) ;
20 end
end

Then in use, we would have a session like this.

Listing 18.13: Simple edges session: column vectors


a = {[1;2];[3;5];[3;7]};
E2 = e d g e s ( a ) ;
E2 . e
4 ans =

1 3 3
2 5 7

We can also use row vectors in the constructor. Hence, this is fine also.

Listing 18.14: Simple edges session: row vectors


a = {[1 ,2];[3 ,5];[3 ,7]};
E2 = e d g e s ( a ) ;
3 E2 . e
ans =

1 3 3
2 5 7

Now that we can construct a vertices and edges object, we can build a graph class.
344 18 Graph Models

18.1.3 A First Graph Class

Next, in the directory @graphs, we write the constructor and indexing code.

Listing 18.15: The graphs class


f u n c t i o n g = g r a p h s (V, E)
%
% c o n s t r u c t o r for graph g
%
5 % V = vertices object
% E = edges o b j e c t
%
g . v = V;
g . e = E;
10 g = c l a s s ( g , ’ graphs ’ ) ;
end

Here is the indexing code.

Listing 18.16: The graph subsref overloading


f u n c t i o n out = s u b s r e f ( g , s )
switch s . type
c a s e ’. ’
4 f l d = s . subs ;
i f ( strcmp ( f l d , ’v ’ ) )
Nodes = g . v ;
o u t = Nodes . v ;
e l s e i f ( strcmp ( f l d , ’e ’ ) )
9 Edges = g . e ;
o u t = Edges . e ;
else
e r r o r ( ’ invalid property for vertices object ’ ) ;
end
14 otherwise
e r r o r ( ’ @vertices / subsref : invalid subscript type for vertices ’ )
;
end
end

Notice the pair of statements we use as access via g.v.v is not possible. We do this
pair of steps:

Listing 18.17: Access via g.v.v is impossible


Nodes = g . v ;
o u t = Nodes . v ;

This is necessary because the line out = g.v.v does not parse correctly. So we
do it in two steps. The same comment holds for the way we handle the edges. We
then build a graph in this session.
18.1 Building Global Graph Objects One 345

Listing 18.18: Simple graph session


G2 = g r a p h s ( N2 , E2 ) ;
G2 . v
ans =

5 1 2 3 4 5
G2 . e
ans =

1 3 3
10 2 5 7

Next, we add some methods such as compute the incidence matrix.

18.2 Adding Class Methods First Pass

We will now add methods to each of our three classes. So far, we haven’t found a
way to force Octave to update its knowledge of the new code as we write it. Note,
we have to save our work in sessions and manually restart and reload code each time
we change the class codes. This is not a problem in MatLab, but you should be aware
of this if you use Octave.

18.2.1 Adding Edge Methods

We add code to add an edge to an existing edge object. This is in the file
@edge/add.m. Here is the code which specifically needs to have both the to and
from nodes as arguments.

Listing 18.19: Adding an edge to an existing edge


f u n c t i o n W = add ( E , u , v )
%
% edge i s added t o edge l i s t
%
5 n = l e n g t h (E . e ) ;
out = { [E . e ( 1 ) . i n ; E . e ( 1 ) . out ] } ;
f o r i =2:n
temp = { [ E . e ( i ) . i n ; E . e ( i ) . o u t ] } ;
o u t = [ out , temp ] ;
10 end
temp = { [ u ; v ] } ;
o u t = [ out , temp ] ;
W = edges ( out ) ;
end

The arguments (u,v) are the in node and out node values of the new edge. The
syntax for adding a method to a class is that the first argument of the function must
be the name we give the edge object in the edge constructor. We take the existing
E.e data and construct a new data container by concatenation. When we finish the
loop, we have copied all the old data over. We then concatenate the new edge data.
346 18 Graph Models

Then, we have the data object we can use to create a new edge object. We do that in
the line W = edges(out); and return W. The reason we use this design is that we
are not allowed to concatenate the incoming new data to the end of the old edge list.
Hence, we must do a complete rebuild. In a language that supports object oriented
programming ideas better, this would not have been necessary.

18.2.2 Adding Vertices Methods

We add code to add a node to an existing vertices object. This is in the file
@vertices/add.m. Here is the code:

Listing 18.20: Adding a node to existing vertices object


f u n c t i o n W = add (V, j )
%
% node i s a d d e d t o v e r t e x l i s t
%
5 n = l e n g t h (V. v ) ;
o u t = V. v ( 1 ) . node ;
f o r i =2:n
o u t = [ o u t V. v ( i ) . node ] ;
end
10 out = [ out j ] ;
W = v e r t i c e s ( out ) ;
end

We take the existing V.v data and again construct a new data container by concate-
nation. When we finish the loop, we have copied all the old data over. We then con-
catenate the new vertex node to the data. At this point, we have the data object we can
use to create a new vertex object. We do that in the line W = vertices(out);
and return W. The same comments we made in the edge addition apply here. The
language MatLab or Octave forces us to do the global rebuild.

18.2.3 Adding Graph Methods

We now write code to add an edge, in the file addedge.m and add a node, in the
file addnode.m. First, look at the adding an edge code. This builds on the add code
in the vertices class.

Listing 18.21: Adding an edge in a global graph


f u n c t i o n W = addedge ( g , i , j )
%
% a d j o i n an e d g e t o an e x i s t i n g graph
%
5 E = add ( g . e , i , j ) ;
W = g r a p h s ( g . v , E) ;
end

We now add a node using the add node method for the vertices class.
18.2 Adding Class Methods First Pass 347

Listing 18.22: Adding a node in a global graph


f u n c t i o n W = addnode ( g , j )
%
3 % a d j o i n a node t o an e x i s t i n g graph
%
V = add ( g . v , j ) ;
W = g r a p h s (V, g . e ) ;
end

Once this code is built, we can add methods for calculating the incidence matrix for
the graph and its corresponding Laplacian. To find the incidence matrix, we use a
simple loop.

Listing 18.23: Simple incidence matrix for global graph


function K = incidence (g)
%
3 % g i s a graph having
% vertices g . v
% edges g . e
%
% Get t h e e d g e o b j e c t from g
8 E = g.e;
% g e t t h e e d g e s from t h e e d g e o b j e c t
e2 = E . e ;
% g e t t h e v e r t i c e s o b j e c t from g
V = g.v;
13 % g e t t h e v e r t i c e s from t h e v e r t i c e s o b j e c t
v2 = V. v ;
% f i n d o u t how many v e r t i c e s and e d g e s t h e r e are
[ row , s i z e V ] = s i z e ( v2 ) ;
[ row , s i z e E ] = s i z e ( e2 ) ;
18 K = z e r o s ( sizeV , s i z e E ) ;
%
% setup incidence matrix
%
f o r i = 1: sizeV
23 for j = 1: sizeE
i f e2 ( 1 , j )== i
K( i , j ) = 1 ;
e l s e i f e2 ( 2 , j ) == i
K( i , j ) = −1;
28 end
end
end

end

The Laplacian of the graph is then easy to find (recall we discussed the Laplacian
for a graph in Chap. 15!).
348 18 Graph Models

Listing 18.24: Laplacian for global graph


function L = laplacian (g)
%
3 % g i s a graph having
% vertices g . v
% edges g . e
%
K = incidence (g) ;
8 L = K∗K ’ ;
end

18.2.4 Using the Methods

Here is a session where we build the OCOS and FFP networks and find their Lapla-
cians with eigenvalue structure. We add some annotation here and there. First, we
build an OCOS graph directly. This OCOS circuit includes the thalamus connection
which we will later remove as the thalamus will be implemented separately. We
define the node and edge vectors using integers for each node. We will label the
nodes in ascending order now so the resulting graph is feedforward.

Listing 18.25: Define the OCOS nodes and edges data


1 % d e f i n e t h e OCOS n o d e s
V = [1;2;3;4;5;6;7;8];
% d e f i n e t h e OCOS e d g e s
E = {[1;2] ,[2;3] ,[2;4] ,[2;5] ,[3;6] ,[4;7] ,[5;8] ,[2;7] ,[1;7]};
% c o n s t r u c t t h e OCOS e d g e o b j e c t

We then construct edge and vertices objects.

Listing 18.26: Construct the OCOS edges and vertices object


e = e d g e s (E) ;
% c o n s t r u c t t h e OCOS v e r t i c e s object
v = v e r t i c e s (V) ;

Next, we construct the graph object and check to see if our overloaded subsref.m
code is working correctly.

Listing 18.27: Construct the OCOS graph


% c o n s t r u c t t h e OCOS g r a p h
OCOS=g r a p h s ( v , e ) ;
% v e r i f y t h e OCOS e d g e l i s t
OCOS . v
5 ans =
1 2 3 4 5 6 7 8
% v e r i f y t h e OCOS e d g e s l i s t
OCOS . e
ans =
10 1 2 2 2 3 4 5 2 1
2 3 4 5 6 7 8 7 7
18.2 Adding Class Methods First Pass 349

We then explicitly add nodes and edges to build an FFP graph. Note for a small graph
this is not very expensive, but the way nodes and edges are added is actually very
inefficient as we have to rebuild the node and edge lists from scratch with each add.

Listing 18.28: Create FFP by adding nodes and edges to OCOS


% add t h e FFP n o d e s 9 , 10 and 11
FFP = addnode (OCOS, 9 ) ;
FFP = addnode (FFP , 1 0 ) ;
4 FFP = addnode (FFP , 1 1 ) ;
% add t h e FFP e d g e s
FFP = addedge (FFP , 1 1 , 2 ) ;
FFP = addedge (FFP, 1 0 , 1 1 ) ;
FFP = addedge (FFP , 9 , 1 0 ) ;

Now let’s verify we added the nodes and edges.

Listing 18.29: Verify the added nodes and edges


% v e r i f y t h e FFP node list
2 FFP . v
ans =
1 2 3 4 5 6 7 8 9 10 11
% v e r i f y t h e FFP e d g e list
FFP . e
7 ans =
1 2 2 2 3 4 5 2 1 11 10 9
2 3 4 5 6 7 8 7 7 2 11 10

Next, let’s find incidence matrices.

Listing 18.30: Get Incidence Matrix


1 % f i n d t h e OCOS i n c i d e n c e m a t r i x
K = i n c i d e n c e (OCOS)
K =
1 0 0 0 0 0 0 0 1
−1 1 1 1 0 0 0 1 0
6 0 −1 0 0 1 0 0 0 0
0 0 −1 0 0 1 0 0 0
0 0 0 −1 0 0 1 0 0
0 0 0 0 −1 0 0 0 0
0 0 0 0 0 −1 0 −1 −1
11 0 0 0 0 0 0 −1 0 0
% f i n d t h e FFP i n c i d e n c e m a t r i x
K2 = i n c i d e n c e (FFP)
K2 =
1 0 0 0 0 0 0 0 1 0 0
0
350 18 Graph Models

16 −1 1 1 1 0 0 0 1 0 −1 0
0
0 −1 0 0 1 0 0 0 0 0 0
0
0 0 −1 0 0 1 0 0 0 0 0
0
0 0 0 −1 0 0 1 0 0 0 0
0
0 0 0 0 −1 0 0 0 0 0 0
0
21 0 0 0 0 0 −1 0 −1 −1 0 0
0
0 0 0 0 0 0 −1 0 0 0 0
0
0 0 0 0 0 0 0 0 0 0 0
1
0 0 0 0 0 0 0 0 0 0 1
−1
0 0 0 0 0 0 0 0 0 1 −1
0

We can then find the Laplacian of the graphs.

Listing 18.31: Get Laplacian


% f i n d t h e OCOS L a p l a c i a n
L = l a p l a c i a n (OCOS)
L =
2 −1 0 0 0 0 −1 0
5 −1 5 −1 −1 −1 0 −1 0
0 −1 2 0 0 −1 0 0
0 −1 0 2 0 0 −1 0
0 −1 0 0 2 0 0 −1
0 0 −1 0 0 1 0 0
10 −1 −1 0 −1 0 0 3 0
0 0 0 0 −1 0 0 1
% f i n d t h e FFP L a p l a c i a n
L2 = l a p l a c i a n (FFP)
L2 =
15 2 −1 0 0 0 0 −1 0 0 0 0
−1 6 −1 −1 −1 0 −1 0 0 0 −1
0 −1 2 0 0 −1 0 0 0 0 0
0 −1 0 2 0 0 −1 0 0 0 0
0 −1 0 0 2 0 0 −1 0 0 0
20 0 0 −1 0 0 1 0 0 0 0 0
−1 −1 0 −1 0 0 3 0 0 0 0
0 0 0 0 −1 0 0 1 0 0 0
0 0 0 0 0 0 0 0 1 −1 0
0 0 0 0 0 0 0 0 −1 2 −1
25 0 −1 0 0 0 0 0 0 0 −1 2

Once we have these matrices, we can use standard tools within Octave to find their
corresponding eigenvalues and eigenvectors which we might want to do now and
then.
18.2 Adding Class Methods First Pass 351

Listing 18.32: Get OCOS and FFP Laplacian eigenvalue and eigenvectors
% g e t t h e e i g e n v a l u e and e i g e n v e c t o r s for
% t h e OCOS L a p l a c i a n
[ e i g v e c o c o s , e i g v a l o c o s ]= e i g ( L ) ;
% e x t r a c t t h e OCOS e i g e n v a l u e s
5 vals = diag ( eigvalocos )
vals =
−0.0000
0.3820
0.5607
10 2.0000
2.3389
2.6180
4.0000
6.1004
15 % g e t t h e e i g e n v a l u e and e i g e n v e c t o r s for
% t h e FFP L a p l a c i a n
[ e i g v e l a p , e i g v a l l a p ]= e i g ( L2 ) ;
% e x t r a c t t h e FFP e i g e n v a l u e s
vals2 = diag ( e i g v a l l a p )
20 vals2 =
−0.0000
0.2338
0.3820
0.5786
25 1.4984
2.0000
2.3897
2.6180
3.1986
30 4.0000
7.1010

18.2.5 Adding a Graph to an Existing Graph

If we want to add a second graph to an existing graph, we need to write an addgraph


method for our graph class. Here is one way to do it. It is in the file addgraph.m in
the @graph directory. Our design for this code is to specify the new edge by simply
in the form of cell data

Listing 18.33: Links data


{ [ t o ; from ] , . . . } ;

where we have to remember that the nodes to and from need to be given using the
node numbering from the new larger graph we just constructed. That is hard to do so
later we will use a vector address strategy for building graphs from subgraphs, but
that is for a later time. Here is the annotated code.
352 18 Graph Models

Listing 18.34: Adding a graph to an existing graph


f u n c t i o n W = addgraph ( g , H, l i n k s )
%
% a d j o i n a graph H using a l i s t of edges
% t o c o n n e c t t h e two g r a p h s w h i c h i s g i v e n in a cell data
5 % structure , links
%
% g e t edge o b j e c t of H
eh = H. e ;
% g e t t h e e d g e s o f eh
10 e h e d g e s = eh . e ;
% get v e r t i c e s o b j e c t of H
nh = H. v ;
% g e t t h e n o d e s o f nh
nhnodes = nh . v ;
15 % g e t edge o b j e c t of g
eg = g . e ;
% g e t t h e edges of eg
e g e d g e s = eg . e ;
% get v e r t i c e s o b j e c t of g
20 ng = g . v ;
% g e t t h e n o d e s o f ng
n g n o d e s = ng . v ;

% copy nodes of g t o temporary a


25 a = ngnodes ;

% g e t s i z e s o f a l l n o d e s and e d g e s
[ n , n o d e s i z e g ] = s i z e ( ng n od e s ) ;
[ n , n o d e s i z e h ] = s i z e ( nhnodes ) ;
30 [ n , edgesizeg ] = s i z e ( egedges ) ;
[ n , edgesizeh ] = s i z e ( ehedges ) ;

% c r e a t e a copy of t h e nodes of H
% and add t h e number o f n o d e s o f g
35 % to each e ntry
for i = 1: nodesizeh
b ( i ) = nhnodes ( i ) + n o d e s i z e g ;
end

40 % c r e a t e t h e new a d j o i n e d node l i s t
v = [a, b];
% c r e a t e t h e new v e r t i c e s o b j e c t
V = vertices (v) ;

45 % copy t h e e d g e s of g i n t o c
% t h i s i s d a t a i n v e c t o r form
c = egedges ;
% add t h e number o f n o d e s o f g
% t o e a c h i n and o u t v a l u e i n t h e
50 % edges of H
d = ehedges + nodesizeg ;
% now c r e a t e t h e { [ i ; j ] , [ u ; v ] }
% t y p e d a t a s t r u c t u r e we n e e d f o r
% the edges constructor
55 % i n i t i a l i z e the data e
e = {egedges ( : , 1 ) };
% add more e n t r i e s u n t i l t h e e d g e s
% o f g a r e u s e d up
for i = 2: edgesizeg
60 e{ i } = c ( : , i ) ;
end
% now add t h e e d g e s o f H
% making s u r e we s e t t h e c o u n t e r
% t o s t a r t a f t e r t h e number o f e d g e s
65 % in g
f o r j =1: e d g e s i z e h
e { j+e d g e s i z e g } = d ( : , j ) ;
end

70 % now e i s i n t h e p r o p e r f o r m a t
% and we can c o n s t r u c t t h e new e d g e object
E = edges ( e ) ;
W = g r a p h s (V, E) ;
s i z e l i n k s = length ( l i n k s ) ;
75 f o r i =1: s i z e l i n k s
from = l i n k s { i } ( 1 , 1 ) ;
to = links { i }(2 ,1) ;
W = addedge (W, from , t o ) ;
end
18.2 Adding Class Methods First Pass 353

It is now straightforward to build more complicated graphs. Consider the following


session. Note we still have to manually setup the links between the two graphs using
the new global node number; that is irritating but we will work on that later when
we do address graphs.

Listing 18.35: Building a graph from other graphs


1 V = [1;2;3;4;5;6;7;8];
E = {[1;2] ,[2;3] ,[2;4] ,[2;5] ,[3;6] ,[4;7] ,[5;8] ,[2;7] ,[1;7]};
e = e d g e s (E) ;
v = v e r t i c e s (V) ;
OCOS=g r a p h s ( v , e ) ;
6 OCOS2 = g r a p h s ( v , e ) ;
% a d j o i n t h e s e c o n d OCOS g r a p h b y a d d i n g an e d g e from
% node 7 o f t h e o r i g i n a l OCOS g r a p h t o node 1 o f t h e new OCOS2 g r a p h
W = addgraph (OCOS, OCOS2 , { [ 7 ; 9 ] } ) ;
W. e
11 a n s =
Columns 1 t h r o u g h 16
1 2 2 2 3 4 5 2 1 9 10
10 10 11 12 13
2 3 4 5 6 7 8 7 7 10 11
12 13 14 15 16
Columns 17 t h r o u g h 19
16 10 9 7
15 15 9
W. v
ans =
Columns 1 t h r o u g h 1 4 :
21 1 2 3 4 5 6 7 8 9 10 11 12 13
14
Columns 15 and 1 6 :
15 16

Notice how awkward this code is. The design here removes all the old node and
edge information from the incoming graph and renumbers and relabels everything
to fit into one new graph. It also assumes there is an edge used to glue the graphs
together. If we want to build a small brain model, we will want to assemble many
modules. Each module will have its own organization and we will want to remember
that. However, to use the Laplacian of the small brain model, we will need to have
an incidence matrix written in terms of global node numbers. Hence, we need a way
to organize nodes in terms of their module structure. This requires an address rather
than a node number. We will explore this in the next section where we build a new
graph class based on these new requirements.

18.2.6 Drawing Graphs

We need a way to visualize our graphs. We will use visualization software package
called Graphviz (North and Ganser 2010) which includes the tool dot which draws
directed graphs. The documentation for dot tells us how to write a file which the dot
command can use to generate the drawing. Consider the file incToDot.m which
takes the incidence matrix of a graph, inc and uses it to write the needed file. This
354 18 Graph Models

file can have any name we want but it uses the extension .dot. Assume the file
name we generate is MyFile.dot. Once the file is generated, outside of MatLab
we then run the dot command to generate the .pdf file MyFile.pdf as follows:

Listing 18.36: Using dot to create a graph visualization


d o t −Tpdf −oMyFile . p d f MyFile . d o t

We choose the pdf format as it is a vector graphic and hence can be scaled up without
loss of resolution as far as we want. This is very important in very large graphs. Here
is the code.

Listing 18.37: The incidence to dot code: incToDot


f u n c t i o n incToDot ( i n c , width , h e i g h t , a s p e c t , f i l e n a m e )

[ N, E ] = s i z e ( i n c ) ;
4 [ f i d , msg ] = f o p e n ( f i l e n a m e , ’w ’ ) ;
f p r i n t f ( f i d , ’ digraph G {\ n ’ ) ;
f p r i n t f ( f i d , ’ size =\"% f ,% f \";\ n ’ , width , h e i g h t ) ;
f p r i n t f ( f i d , ’ rankdir = TB ;\ n ’ ) ;
f p r i n t f ( f i d , ’ ratio =% f ;\ n ’ , a s p e c t ) ;
9 f o r i = 1 :N
f p r i n t f ( f i d , ’% d [ shape =\" rectangle \" fontsize =20 fontname =\" Times
new Roman \" label =\"% d \"];\ n ’ , i , i ) ;
end
f p r i n t f ( f i d , ’\ n ’ ) ;
for j = 1:E
14 a = 0;
b = 0;
f o r i = 1 :N
i f i n c ( i , j ) == 1
a = i;
19 e l s e i f i n c ( i , j ) == −1
b = i;
end
end
i f ( a ˜= 0 ) && ( b ˜= 0 )
24 f p r i n t f ( f i d , ’%d - >% d ;\ n ’ , a , b ) ;
end
end
f p r i n t f ( f i d , ’\ n } ’ ) ;
fclose ( fid ) ;
29 end

Let’s use this on the OCOS circuit. First, we generate the OCOS.dot file.

Listing 18.38: Generate the OCOS dot file


incToDot (KOCOS, 6 , 6 , 1 . 0 , ’ OCOS . dot ’ ) ;

The dot file has the following look. Note the graph is organized simply. Each node is
identified with an integer and then in brackets there is a string which specifies details
about its shape and its label. Here we use the same label as the identifier. The edges
18.2 Adding Class Methods First Pass 355

Fig. 18.4 OCOS graph

are as simple as possible, specified with the string 4->1 and so forth. You’ll need to
look at the dot documentation to see more details about how to do this.

Listing 18.39: A generated dot file


digraph G {
s i z e =”6.000000 ,6.000000”;
r a n k d i r=TB ;
r a t i o =1.000000;
5 1 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”1”];
2 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”2”];
3 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”3”];
4 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”4”];
5 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”5”];
10 6 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”6”];
7 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”7”];
8 [ s h a p e=” r e c t a n g l e ” f o n t s i z e =20 fontname=”Times new Roman” l a b e l
=”8”];

1−>2;
15 2−>3;
2−>4;
2−>5;
3−>6;
4−>7;
20 5−>8;
2−>7;
1−>7;
}

Then, in a separate terminal window, we generate the OCOS.pdf file which is shown
in Fig. 18.4.

Listing 18.40: Generate OCOS graphic as pdf file


d o t −Tpdf −oOCOS . p d f OCOS. d o t
356 18 Graph Models

18.2.7 Evaluation and Update Strategies

At any node i in our graph, the nodes which interact with it via an edge contact are
in the backward set for node i, B(i). Hence, the input to node i is the sum

Ej→i Y (j)
j∈B(i)

where Ej→i is the value we assign the edge between j and i in our graph and Y (j) is
the output we assign to node j. We also know the nodes that node i sends its output
signal to are in its forward set F(i). Both of these sets are readily found by looking
at the incidence matrix for the graph. We can do this in the code BFsets which is a
new graph method placed in the @graphs directory. In it,we find the backward and
forward sets for each neuron in both the six dimensional address form and the global
node form. In a given row of the incidence matrix, the positive 1’s tells us the edges
corresponding to the forward links and the negative 1’s give us the backward edges.
So we look at each row of the incidence matrix and calculate the needed nodes for
the backward and forward sets.

Listing 18.41: Finding backward and forward sets: BFsets0


f u n c t i o n [ BackGlobal , ForwardGlobal , BackEdgeGlobal ] = B F s e t s 0 ( g , Kg)
%
% g i s the graph
% Kg i s t h e i n c i d e n c e m a t r i x o f t h e g r a p h
5 % g . e i s the edge o b j e c t of the graph
%

% get edges
Eg = g . e ;
10 % get edge array
E = Eg . e ;

[ Kgrows , K g c o l s ] = s i z e (Kg) ;

15 BackGlobal = {};
ForwardGlobal = {};
BackEdgeGlobal = { } ;
f o r i = 1 : Kgrows
BackGlobal{ i } = [ ] ;
20 ForwardGlobal { i } = [ ] ;
BackEdgeGlobal { i } = [ ] ;
f o r j = 1 : Kgcols
d = E(: , j ) ;
u = d(1) ;
25 v = d(2) ;
i f Kg( i , j ) == 1
ForwardGlobal { i } = [ ForwardGlobal { i } , v ] ;
e l s e i f Kg( i , j ) == −1
BackGlobal{ i } = [ BackGlobal{ i } , u ] ;
30 BackEdgeGlobal { i } = [ BackEdgeGlobal { i } , j ] ;
end
end
end

35 end
18.2 Adding Class Methods First Pass 357

The simplest evaluation strategy then is to let each node have an output value deter-
mined by a simple sigmoid function such as given in the code sigmoid.m.

Listing 18.42: The sigmoid function


function y = sigmoid ( x )
%
% x is the input
%
5 y = 0 . 5 ∗ ( 1 . 0 + tanh ( x ) ) ;
end

We use this sigmoid to do a simple graph evaluation. This is done in the code below.
We simply evaluate the sigmoid functions at each node using the current synaptic
interaction values.

Listing 18.43: The evaluation


f u n c t i o n NodeVals = e v a l u a t i o n (Y,W, B , BE)
%
% g i s the graph
4 % Y i s node v e c t o r
% B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
% BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
%
% get size
9 s i z e V = l e n g t h (Y) ;
f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
14 BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
f o r j = 1 : lenBEF
19 l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
NodeVals ( i ) = s i g m o i d ( sum ) ;
24 end

A simple edge update function is then given by in the file below. This uses what is
called a Hebbian update to change the scalar weight associated with each edge. If
the value of the post neuron i is high and the value of the edge Ej→i is also high,
358 18 Graph Models

the value of this edge is increased by a multiplier such as 1.05 or 1.1. The code to
implement this is below.

Listing 18.44: The Hebbian code: HebbianUpdateSynapticValue


1 f u n c t i o n Wts = H e b b i a n U p d a t e S y n a p t i c V a l u e (Y,W, B , BE)
%
% l i n k i s e d g e number
% pre i s t h e pre neuron
% p o s t i s t h e p o s t neuron
6 % W i s the edge vector
% Y i s t h e new v a l u e o f e d g e
%

% get size
11 s i z e V = l e n g t h (Y) ;
s i z e E = l e n g t h (W) ;

scale = 1.1;
f o r i = 1: sizeV
16 % g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
21 lenBEF = l e n g t h (BEF) ;
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
p r e = BF( j ) ;
post = i ;
26 hebb = Y( p o s t ) ∗W( l i n k ) ;
w e i g h t = W( l i n k ) ;
i f a b s ( hebb ) > . 3 6
w e i g h t = s c a l e ∗W( l i n k ) ;
end
31 Wts ( l i n k ) = w e i g h t ;
end
end

A simple example is shown below for the evaluation step for a small OCOS model.
First, we set up a node value vector Y the size of the nodes and a weight value vector
W the size of edges of the brain model. We fill the weight vector randomly with
numbers from −1 to 1. Then we compute the needed backward set information and
use it to perform an evaluation step. Then we do a synaptic weight Hebbian update.
At this point, we are just showing the basic ideas of this sort of computation. We
still have to restrict certain edges to have only negative values as they correspond to
inhibitory synaptic contacts. Also, we have not yet implemented the cable equation
update step afforded us with the graph’s Laplacian. That is to come. Here is an
example computation.
18.2 Adding Class Methods First Pass 359

Listing 18.45: An OCOS Hebbian Update


vOCOS = { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } ;
2 eOCOS = { [ 1 ; 2 ] , [ 2 ; 3 ] , [ 2 ; 4 ] , [ 2 ; 5 ] , [ 3 ; 6 ] , [ 4 ; 7 ] , [ 5 ; 8 ] , [ 2 ; 7 ] , [ 1 ; 7 ] } ;
VOCOS = v e r t i c e s (vOCOS) ;
EOCOS = e d g e s (eOCOS) ;
OCOS = g r a p h s (VOCOS, EOCOS) ;
KOCOS = i n c i d e n c e (OCOS) ;
7 [m, n ] = s i z e (KOCOS) ;
Y = z e r o s ( 1 ,m) ;
W = −1+2∗rand ( 1 , n ) ;
[ B , F , BE ] = B F s e t s 0 (OCOS,KOCOS) ;
Y = e v a l u a t i o n (OCOS, Y,W, B , BE) ;
12 W = H e b b i a n U p d a t e S y n a p t i c V a l u e (Y,W, B , BE) ;

We initialize all the node values to zero; hence, all of the input values to the nodes
will be zero as well and so the first evaluation will return a constant 0.5.

Listing 18.46: First evaluation


Y
Y =
3 0.50000 0.50000 0.50000 0.50000 0.50000 0.50000
0.50000 0.50000
W
W=
0.6294 0.8927 −0.8206 0.9094 0.2647 −0.8854
−0.4430 0.0938 1.0065

If we now evaluate again, we will see the change in Y .

Listing 18.47: Second evaluation


Y = e v a l u a t i o n (OCOS, Y,W, B , BE) ;
Y
Y =
4 0.5000 0.6524 0.7095 0.3056 0.7129 0.5658
0.5535 0.3910

We can also do a simulation loop. Consider this code

Listing 18.48: A simple simulation: DoSim


1 f u n c t i o n [ Y,W] = DoSim ( NumIters , g , Kg , B , BE, Y,W)
%
f o r t =1: NumIters
Y = e v a l u a t i o n ( g , Y,W, B , BE) ;
W = H e b b i a n U p d a t e S y n a p t i c V a l u e (Y,W, B , BE) ;
6 end

Let’s evaluate for say 20 steps.


360 18 Graph Models

Listing 18.49: A 20 step evaluation


[ Y,W] = DoSim ( 2 0 ,OCOS,KOCOS, B , BE, Y,W) ;

After 20 iterations, we find

Listing 18.50: Second evaluation


Y
Y =
0.5000 0.9792 1.0000 0.1689 1.0000 0.6293
0.9888 0.2919
4 W
W=
4.2346 6.0059 −0.8206 6.1182 0.2647 −5.9566
−0.4430 0.0938 6.7713

If the graph was a feed forward architecture, we could also implement a link update
strategy using the partial derivatives of an error function with respect to the mutable
parameters of the graph architecture. But that is another story!

18.3 Training

We are now ready to attempt to train our networks to match input to output data. We
begin with the code to train a feedforward architecture to do that. We have already
done this in Chap. 16 for the classical matrix-vector feedforward network architecture
using gradient descent. The chain feedforward network architecture is more flexible
and we have discussed it carefully in Chap. 17 and derived the new algorithms for
evaluation and gradient update. So now we will implement them. First, let’s redo the
evaluation code so we can handle node functions with gains and offsets. Also, we
want to be able to have different node functions for input and output nodes. We do
this in the new evaluation code here. We will do it a bit differently here. Now we
code the needed derivatives as follows. We need a lot of σ  terms. So, for this type
of σ, we note
  
1 y−o
σ(y, o, g) = (H + L) + (H − L) tanh
2 g
  2
∂σ H −L y−o
= sech
∂y 2g g
18.3 Training 361

and so
  2
∂σ H −L y−o
=− sech
∂o 2g g
∂σ
=−
∂y

and
  2
∂σ H −L y−o
=− sech
∂o 2g 2 g
1 ∂σ
=− .
g ∂y
∂σ
So in the following code, we will define the ∂y
functions.

Listing 18.51: Initializing the Node Functions for the Chain FFN
nodefunction = {};
nodefunctionprime = {};
%
4 sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
SL = − 1 . 2 ;
SH = 1 . 2 ;
% o u p u t n o d e s r a n g e from SL t o SH
9 s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
%
n o d e f u n c t i o n {1} = si gma ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;
14 %
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n )
.ˆ2;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
%
19 n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;

Note again nodefunction{1} is the main node processing function,


nodefunction{2} is the input node function and nodefunction{3} is the
input node function. You see this used in the evaluation code as follows
362 18 Graph Models

Listing 18.52: Graph evaluation code: evaluation2.m


f u n c t i o n [ I n V a l s , OutVals ] = e v a l u a t i o n 2 ( n o d e f u n c t i o n , g , I , IN ,OUT,W, O, G
, B , BE)
%
% n o d e f u n c t i o n a r e t h e node f u n c t i o n s a s a c e l l
% g i s the graph
5 % I is the input vector
% IN i s t h e l i s t o f i n p u t n o d e s a s a c e l l
% OUT i s t h e l i s t o f o u t p u t n o d e s a s a c e l l
% Y i s node v e c t o r
% W is the link vector
10 % O is the o f f s e t vector
% G i s the gain vector
% B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
% BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
%
15 % O u t V a l s i s t h e new node o u t p u t v e c t o r
% I n V a l s i s t h e node i n p u t v e c t o r
%

% get sizes
20 s i z e I N = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes . v ) ;
InVals = zeros (1 , sizeY ) ;
25
sigmainput = nodefunction {1};
sigmamain = n o d e f u n c t i o n { 2 } ;
sigmaoutput = nodefunction {3};
sigma = { } ;
30 % s e t a l l node f u n c t i o n s t o sigmamain
f o r i = 1: sizeY
sigma { i } = sigmamain ;
end
% r e s e t i n p u t node f u n c t i o n s t o s i g m a i n p u t
35 for j = 1: sizeIN
sigma {IN{ j }} = s i g m a i n p u t ;
end
% r e s e t o u t p u t node f u n c t i o n s t o s i g m a o u t p u t
f o r j = 1 : sizeOUT
40 sigma {OUT{ j }} = s i g m a o u t p u t ;
end
f o r i = 1: sizeY
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
50 % add i n p u t i f t h i s i s an i n p u t node
for j = 1: sizeIN
i f i == IN{ j }
sum = sum + I ( j ) ;
end
55 end
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
60 end
I n V a l s ( i ) = sum ;
Y( i ) = s igma { i } ( sum , O( i ) ,G( i ) ) ;
end

65 OutVals = Y;
end

We use the code similar to what we did in the matrix ffn work.
18.3 Training 363

Listing 18.53: Testing the new evaluation code


nodefunction = {};
nodefunctionprime = {};
sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
4 n o d e f u n c t i o n {1} = si gma ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
SL = − 1 . 2 ;
SH = 1 . 2 ;
9 % o u p u t n o d e s r a n g e from SL t o SH
s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;

s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n )
.ˆ2;
14 n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
19 %
% c o n s t r u c t a 1−5−1 MFFN a s a c f f n
v = [1;2;3;4;5;6;7];
e = {[1;2] ,[1;3] ,[1;4] ,[1;5] ,[1;6] ,[2;7] ,[3;7] ,[4;7] ,[5;7] ,[6;7]};
E = edges ( e ) ;
24 V = vertices (v) ;
G = g r a p h s (V, E) ;
KG = i n c i d e n c e (G) ;
[ NodeSize , E d g e S i z e ] = s i z e (KG) ;
Y = z e r o s ( 1 , NodeSize ) ;
29 OL = − 1 . 4 ;
OH = 1 . 4 ;
GL = 0 . 8 ;
GH = 1 . 2 ;
O f f s e t = OL+(OH−OL) ∗ rand ( 1 , N o d e S i z e ) ;
34 Gain = GL+(GH−GL) ∗ rand ( 1 , N o d e S i z e ) ;
W = −1+2∗rand ( 1 , E d g e S i z e ) ;
[ B , F , BE ] = B F s e t s 0 (G,KG) ;
IN = { 1 } ;
OUT = { 7 } ;
39 I = 0.3;
[ y ,Y] = e v a l u a t i o n 2 ( n o d e f u n c t i o n , G, I , IN ,OUT,W, O f f s e t , Gain , B , BE) ;
Y =

0.060865 0.247482 0.223323 0.522128 0.863697 0.962500


−0.627949

Now to do gradient updates, we need to calculate the error of the chain ffn on the
input and training data. This is done as usual with the energy function.
364 18 Graph Models

Listing 18.54: The energy calculation: energy.m


f u n c t i o n [ E , yI , YI , OI , EI ] = e n e r g y ( n o d e f u n c t i o n , g , I , IN ,OUT, D, Y,W, O, G, B
, BE)
2 %
% g i s the graph
% Y i s node v e c t o r
% W is the link vector
% O is the o f f s e t vector
7 % G i s the gain vector
% B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
% BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
%
% I c o l l e c t i o n of input data
12 % I i s a matrix S rows , Number o f I n p u t s = c o l s
% IN i n p u t n o d e s
% D c o l l e c t i o n of t a r g e t data
% D i s a matrix S rows , Number o f o u t p u t s = c o l s
% OUT o u t p u t n o d e s
17 %
% YI i s Y v e c t o r f o r e a c h i n p u t
% OI i s O u t p u t v e c t o r f o r e a c h i n p u t
% EI i s e r r o r v e c t o r f o r e a c h i n p u t
%
22 S = length ( I ) ;
s i z e Y = l e n g t h (Y) ;
s i z e D = l e n g t h (D) ;
sizeOUT = l e n g t h (OUT) ;
%
27 YI = z e r o s ( S , s i z e Y ) ;
yI = z e r o s (S , sizeY ) ;
OI = z e r o s ( S , sizeOUT ) ;
EI = z e r o s ( S , sizeOUT ) ;

32 sum = 0 ;
f o r i =1:S
[ y I ( i , : ) , YI ( i , : ) ] = e v a l u a t i o n 2 ( n o d e f u n c t i o n , g , I ( i , : ) , IN ,OUT,W, O, G,
B , BE) ;
error = 0;
f o r j = 1 : sizeOUT
37 OI ( i , j ) = YI ( i ,OUT{ j } ) ;
EI ( i , j ) = YI ( i ,OUT{ j } ) − D( i , j ) ;
e r r o r = e r r o r + ( EI ( i , j ) ) ˆ 2 ;
end
sum = sum + e r r o r ;
42 end
E = . 5 ∗ sum ;
end

We can test our code with the following session.


18.3 Training 365

Listing 18.55: Testing the Energy Code


1 X = l i n s p a c e (0 ,2∗ pi , 4 1 ) ;
Input = X’ ;
U = c o s (X) ;
Target = U’ ;
%s e t u p 1−5−1 CFFN a s b e f o r e
6 %s e t u p n o d e f u n c t i o n a s b e f o r e
...
[ E , yI , YI , OI , EI ] = e n e r g y ( n o d e f u n c t i o n , G, Input , IN ,OUT, Target , Y,W,
O f f s e t , Gain , B , BE) ;
% s e e wh a t t h e e n e r g y i s
E = 35.0738

Now we are ready to train. However, to do gradient updates, we need forward edge
information. So we need to update the BFsets0 code to return that information.
Here is the changed code which is now BFsets.
Listing 18.56: Returning forward edge information: Changed BFsets code
f u n c t i o n [ BackGlobal , ForwardGlobal , BackEdgeGlobal , ForwardEdgeGlobal ]
= B F s e t s ( g , Kg)
%
% g i s the graph
% Kg i s t h e i n c i d e n c e m a t r i x o f t h e g r a p h
5 % g . e i s the edge o b j e c t of the graph
%
% get edges
Eg = g . e ;
% get edge array
10 E = Eg . e ;

[ Kgrows , K g c o l s ] = s i z e (Kg) ;

BackGlobal = {};
15 ForwardGlobal = {};
BackEdgeGlobal = { } ;
ForwardEdgeGlobal = { } ;
f o r i = 1 : Kgrows
BackGlobal{ i } = [ ] ;
20 ForwardGlobal { i } = [ ] ;
BackEdgeGlobal { i } = [ ] ;
ForwardEdgeGlobal { i } = [ ] ;
f o r j = 1 : Kgcols
d = E(: , j ) ;
25 u = d(1) ;
v = d(2) ;
i f Kg( i , j ) == 1
ForwardGlobal { i } = [ ForwardGlobal { i } , v ] ;
ForwardEdgeGlobal { i } = [ ForwardEdgeGlobal { i } , j ] ;
30 e l s e i f Kg( i , j ) == −1
BackGlobal{ i } = [ BackGlobal{ i } , u ] ;
BackEdgeGlobal { i } = [ BackEdgeGlobal { i } , j ] ;
end
end
35 end
end
366 18 Graph Models

Now we are ready to look at the gradient update code. We think it is interesting to
be able to see how large the gradient norm is and to find at what component this
maximum occurs. So we wrote a utility function to do this. Given a vector, it takes
the absolute value of all the entries and finds the location of the maximum one. With
that, we can return this location in the variable LocMaxGrad. Here is the code for
the utility.

Listing 18.57: Find the location of the absolute maximum of a vector


f u n c t i o n imax = getargmax (V, t o l )
%
% V is a vector
%
5 LengthV = l e n g t h (V) ;
% remove minus s i g n s
W = a b s (V) ;
M = max (W) ;
imax = 1 ;
10 f o r i =1: LengthV
i f W( i ) > M − t o l
imax = i ;
break ;
end
15 end

Now here is the update code. Note we start the ξ loop at the largest node in the OUT
set as the nodes after that are irrelevant to the gradient calculations.

Listing 18.58: Gradient Update Code


f u n c t i o n [ WUpdate , OUpdate , GUpdate , NormGrad , LocMaxGrad ] =
GradientUpdate ( n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e , g , Y,W, O, G, B , BE, F , FE ,
I , IN , D,OUT, r )
%
% nodefunction i s the c e l l of nodefunctions
% nodefunctionprime i s the c e l l of d e r i v a t i v e s
5 % g i s the graph
% Y i s t h e node v e c t o r
% W is the link vector
% O is the o f f s e t vector
% G i s the gain vector
10 % B i s t h e backward s e t i n f o r m a t i o n
% BE i s t h e b a c k w a r d e d g e i n f o r m a t i o n
% F i s the forward s e t information
% FE i s t h e g l o b a l f o r w a r d e d g e i n f o r m a t i o n
%
15 % I i s the l i s t of input vectors
% IN i s t h e l i s t o f i n p u t n o d e s
% D i s the l i s t of t a r g e t vectors
% OUT i s t h e l i s t o f o u t p u t n o d e s
%
18.3 Training 367

20 % r i s t h e amount o f g r a d i e n t u p d a t e
%
% t u r n g a i n u p d a t e s on o r o f f : DOGAIN = 0
% t u r n s them o f f
DOGAIN = 0 ;
25
s i z e Y = l e n g t h (Y) ;
sizeW = l e n g t h (W) ;
s i z e I = length ( I ) ;
s i z e D = l e n g t h (D) ;
30 s i z e I N = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
xi = zeros ( s i z e I , sizeY ) ;
DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;

sigmainputprime = nodefunctionprime {1};


sigmamainprime = n o d e f u n c t i o n p r i m e { 2 } ;
sigmaoutputprime = nodefunctionprime {3};
40 sigmaprime = {};
% s e t a l l node f u n c t i o n s t o sigmamain
f o r i = 1: sizeY
s i g m a p r i m e { i } = sigmamainprime ;
end
45 % r e s e t i n p u t node f u n c t i o n s t o s i g m a i n p u t
for j = 1: sizeIN
s i g m a p r i m e {IN{ j }} = s i g m a i n p u t p r i m e ;
end
% r e s e t o u t p u t node f u n c t i o n s t o s i g m a o u t p u t
50 f o r j = 1 : sizeOUT
s i g m a p r i m e {OUT{ j }} = s i g m a o u t p u t p r i m e ;
end

%f i n d maximum o u t p u t node
55 OMax = OUT{ 1 } ;
f o r i =2: sizeOUT
i f OUT{ i } > OMax
OMax = OUT{ i } ;
end
60 end

f o r alpha = 1: sizeI
for i = OMax : − 1 : 1
% get forward edge i n f o r m a t i o n f o r neuron i
65 FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h (FEF) ;
%f i n d o u t i f i i s a t a r g e t node
for j = 1 : sizeOUT
70 if OUT{ j } == i
k = j;
IsTargetNode = 1 ;
end
end
75 i f I s T a r g e t N o d e == 1
x i ( a l p h a , i ) = (Y( a l p h a , i ) − D( a l p h a , k ) ) ;
else
adder = 0 . 0 ;
368 18 Graph Models

80 l i n k = FEF( j ) ;
% F ( j ) i s a node w h i c h i g o e s f o r w a r d t o
p o s t = FF ( j ) ;
a d d e r = a d d e r + x i ( a l p h a , p o s t ) ∗W( l i n k ) ∗ s i g m a p r i m e { p o s t } ( y I (
a l p h a , p o s t ) ,O( p o s t ) ,G( p o s t ) ) ;
end
85 x i ( a l p h a , i ) = a d d e r ∗YI ( a l p h a , i ) ;
end % x i c a l c u l a t i o n
end % l o o p on n o d e s
end % l o o p on d a t a

90 f o r i = 1 : sizeY
% FE{ i } i s t h e g l o b a l node i n f o r m a t i o n
% where i t h e i t h w e i g h t
% FE{ i } = a s e t o f ( p r e , p o s t )
FF = F{ i } ;
95 FEF = FE{ i } ;
lenFF = l e n g t h ( FF ) ;
f o r j = 1 : lenFF
% get the weight index
l i n k = FEF( j ) ;
100 % W( l i n k ) = E { p r e −> p o s t } w h e r e
pre = i ;
p o s t = FF ( j ) ;
adder = 0 . 0 ;
for k = 1: sizeI
G( p o s t ) ) ∗YI ( k , p r e ) ;
end
DEDW( l i n k ) = a d d e r ;
end
end
110
f o r i = 1 : sizeY
adder = 0 . 0 ;
for k = 1: sizeI
a d d e r = a d d e r + x i ( k , i ) ∗ s i g m a p r i m e { i } ( y I ( k , i ) ,O( i ) ,G( i ) ) ;
115 end
DEDO( i ) = − a d d e r ;
adder = 0 . 0 ;
for k = 1: sizeI
a d d e r = a d d e r + x i ( k , i ) ∗ s i g m a p r i m e { i } ( y I ( k , i ) ,O( i ) ,G( i ) ) ∗ ( y I ( k , i )
−O( i ) ) ;
120 end
DEDG( i ) = − a d d e r /G( i ) ;
end

% f i n d norm
125 i f DOGAIN == 1
Grad = [DEDW DEDO DEDG ] ;
else
Grad = [DEDW DEDO ] ;
end
130 NormGrad = s q r t ( sum ( Grad . ∗ Grad ) ) ;
LocMaxGrad = getargmax ( Grad , . 0 0 1 ) ;
gradtol = 0.05;

i f NormGrad >= g r a d t o l
135 UnitGradW = DEDW/NormGrad ;
UnitGradO = DEDO/NormGrad ;
UnitGradG = DEDG/NormGrad ;
WUpdate = W− UnitGradW ∗ r ;
OUpdate = O − UnitGradO ∗ r ;
18.3 Training 369

140 %GUpdate = G − UnitGradG ∗ r ;


GUpdate = G;
else
WUpdate = W − DEDW ∗ r ;
OUpdate = O − DEDO ∗ r ;
145 %GUpdate = G − DEDG ∗ r ;
GUpdate = G;
end

end

We can then use this to write a training loop.

Listing 18.59: Chain FFN Training Code


1 f u n c t i o n [W, G, O, Energy ] = c h a i n f f n t r a i n ( n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , g , Input , IN , Target ,OUT, Y,W, O, G, B , BE, F , FE , lambda ,
NumIters )
%
% We t r a i n a c h a i n f f n
%
% I n p u t i s i n p u t m a t r i x o f S rows o f i n p u t d a t a
6 % S x number o f i n p u t n o d e s
% t a r g e t i s d e s i r e d t a r g e t v a l u e m a t r i x o f S rows o f t a r g e t d a t a
% S x number o f o u t p u t n o d e s
%
% g i s the chain f f n graph
11 % G i s the gain parameters
% O i s the o f f s e t parameters
% Y i s t h e node o u t p u t d a t a
%
Energy = [ ] ;
16 [ E , yI , YI , OI , EI ] = e n e r g y ( n o d e f u n c t i o n , g , Input , IN ,OUT, Target , Y,W, O, G, B
, BE) ;
Energy = [ Energy E ] ;
f o r t = 1 : NumIters
[W, O,G] = GradientUpdate ( n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e , g , Y,W, O, G, B
, BE, F , FE , Input , IN , Target ,OUT, lambda ) ;
[ E , yI , YI , OI , EI ] = e n e r g y ( n o d e f u n c t i o n , g , Input , IN ,OUT, Target , Y,W, O, G
, B , BE) ;
21 Energy = [ Energy E ] ;
end
p l o t ( Energy ) ;
end

Here is a sample session in its entirety.

Listing 18.60: Sample Training Session


1 nodefunction = {};
nodefunctionprime = {};
sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;

6 SL = − 1 . 2 ;
SH = 1 . 2 ;
% o u p u t n o d e s r a n g e from SL t o SH
s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
n o d e f u n c t i o n {1} = s i g m a i n p u t ;
370 18 Graph Models

11 n o d e f u n c t i o n {2} = s i g m a ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;

s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n )
.ˆ2;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
16 s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a i n p u t p r i m e ;
n o d e f u n c t i o n p r i m e {2} = s i g m a p r i m e ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
% setup t r a i n i n g data
21 X = l i n s p a c e ( 0 , 3 , 3 1 ) ;
Input = X’ ;
U = c o s (X) ;
Target = U’ ;
% c o n s t r u c t a 1−5−1 MFFN a s a g r a p h
26 v = [ 1 ; 2 ; 3 ; 4 ; 5 ; 6 ; 7 ] ;
e = {[1;2] ,[1;3] ,[1;4] ,[1;5] ,[1;6] ,[2;7] ,[3;7] ,[4;7] ,[5;7] ,[6;7]};
E = edges ( e ) ;
V = vertices (v) ;
G = g r a p h s (V, E) ;
31 KG = i n c i d e n c e (G) ;
[ NodeSize , E d g e S i z e ] = s i z e (KG) ;
Y = z e r o s ( 1 , NodeSize ) ;
OL = − 1 . 4 ;
OH = 1 . 4 ;
36 GL = 0 . 8 ;
GH = 1 . 2 ;
O f f s e t = OL+(OH−OL) ∗ rand ( 1 , N o d e S i z e ) ;
Gain = GL+(GH−GL) ∗ rand ( 1 , N o d e S i z e ) ;
W = −1+2∗rand ( 1 , E d g e S i z e ) ;
41 [ B , F , BE, FE ] = B F s e t s (G,KG) ;
IN = { 1 } ;
OUT = { 7 } ;
[ E , yI , YI , OI , EI ] = e n e r g y ( n o d e f u n c t i o n , G, Input , IN ,OUT, Target , Y,W,
O f f s e t , Gain , B , BE) ;
E =
46 18.1698
% Start Training
lambda = . 0 0 0 5 ;
NumIters = 1 0 ;
[W, Gain , O f f s e t , Energy ] = c h a i n f f n t r a i n ( n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e
, G, Input , IN , Target ,OUT, Y,W, O f f s e t , Gain , B , BE, F , FE , lambda , NumIters ) ;
51 [W, Gain , O f f s e t , Energy ] = c h a i n f f n t r a i n ( n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e
, G, Input , IN , Target ,OUT, Y,W, O f f s e t , Gain , B , BE, F , FE , . 0 1 , 2 0 0 ) ;
Energy ( 2 0 0 )
ans =
0.1263

This first chainffntrain code contains a built in plot of the energy shown in
Fig. 18.5.
18.4 Polishing the Training Code 371

Fig. 18.5 The 1-5-1 cos


energy training results

18.4 Polishing the Training Code

Now that we have working code, let’s revisit it and see if we can make it a bit more
compact. We clearly need some sort of initialization method so we don’t have to
type so much. Let’s handle the initialization of all the node functions in one place.
Consider the code below which sets up all the sigma functions and then returns
them for use. We are experimenting here with allowing every single node function
to be unique. We return a cell for both sigma and sigmaprime.

Listing 18.61: Node function Initialization


f u n c t i o n [ sigma , s i g m a p r i m e ] = S i g m o i d I n i t ( NodeSize , IN ,OUT)
%
% I n i t i a l i z e node t r a n s f e r f u n c t i o n s .
%
5 nodefunction = {};
nodefunctionprime = {};
t r a n s f e r f u n c t i o n = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n )
);
n o d e f u n c t i o n {1} = t r a n s f e r f u n c t i o n ;
t r a n s f e r f u n c t i o n i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
10 n o d e f u n c t i o n {2} = t r a n s f e r f u n c t i o n i n p u t ;
SL = − 1 . 2 ;
SH = 1 . 2 ;
% o u p u t n o d e s r a n g e from SL t o SH
t r a n s f e r f u n c t i o n o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗
tanh ( ( y − o f f s e t ) / g a i n ) ) ;
15 n o d e f u n c t i o n {3} = t r a n s f e r f u n c t i o n o u t p u t ;

t r a n s f e r f u n c t i o n p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = t r a n s f e r f u n c t i o n p r i m e ;
t r a n s f e r f u n c t i o n i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
372 18 Graph Models

20 n o d e f u n c t i o n p r i m e {2} = t r a n s f e r f u n c t i o n i n p u t p r i m e ;
t r a n s f e r f u n c t i o n o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗
sech (( y − o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = t r a n s f e r f u n c t i o n o u t p u t p r i m e ;
%
sigmamain = n o d e f u n c t i o n { 1 } ;
25 sigmainput = nodefunction {2};
sigmaoutput = nodefunction {3};
sigm a = { } ;

% get sizes
30 s i z e I N = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;

% s e t a l l node f u n c t i o n s t o sigmamain
f o r i = 1 : NodeSize
35 sigm a { i } = sigmamain ;
end
% r e s e t i n p u t node f u n c t i o n s t o s i g m a i n p u t
for j = 1: sizeIN
sigm a {IN{ j }} = s i g m a i n p u t ;
40 end
% r e s e t o u t p u t node f u n c t i o n s t o s i g m a o u t p u t
f o r j = 1 : sizeOUT
sigm a {OUT{ j }} = s i g m a o u t p u t ;
end
45 %

sigmamainprime = n o d e f u n c t i o n p r i m e { 1 } ;
sigmainputprime = nodefunctionprime {2};
sigmaoutputprime = nodefunctionprime {3};
50 sigmaprime = {};
% s e t a l l node f u n c t i o n s t o sigmamain
f o r i = 1 : NodeSize
s i g m a p r i m e { i } = sigmamainprime ;
end
55 % r e s e t i n p u t node f u n c t i o n s t o s i g m a i n p u t
for j = 1: sizeIN
s i g m a p r i m e {IN{ j }} = s i g m a i n p u t p r i m e ;
end
% r e s e t o u t p u t node f u n c t i o n s t o s i g m a o u t p u t
60 f o r j = 1 : sizeOUT
s i g m a p r i m e {OUT{ j }} = s i g m a o u t p u t p r i m e ;
end

end

We might also want to send in SL and SH as arguments. This leads to the variant
SigmoidInit2.

Listing 18.62: Adding upper and lower bounds to the node function initialization
f u n c t i o n [ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( NodeSize , SL , SH , IN ,OUT)
%
% I n i t i a l i z e node t r a n s f e r f u n c t i o n s .
%
5 ....
a l l i s t h e same e x c e p t SL and SH a r e arguments s o
no need t o i n i t i a l i z e them i n t h e c o d e
....
end
18.4 Polishing the Training Code 373

We then have to alter the code for evaluation, energy calculation, updates and training
to reflect the new way we can access the node functions and their derivatives. First,
the evaluation. We took out the node function initialization and changed the argument
list.

Listing 18.63: The new evaluation function: evaluation3.m


1 function [ I n V a l s , OutVals ] = e v a l u a t i o n 3 ( sigma , g , I , IN ,OUT, Y,W, O, G, B , BE
)
%
% sigma i s t h e n o d e f u n c t i o n as a c e l l
% g i s the graph
% I is the input vector
6 % IN i s t h e l i s t o f i n p u t n o d e s a s a c e l l
% OUT i s t h e l i s t o f o u t p u t n o d e s a s a c e l l
% Y i s node v e c t o r
% W is the link vector
% O is the o f f s e t vector
11 % G i s the gain vector
% B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
% BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
%
% O u t V a l s i s t h e new node o u t p u t v e c t o r
16 % I n V a l s i s t h e node i n p u t v e c t o r
%

% get sizes
s i z e I N = l e n g t h ( IN ) ;
21 sizeOUT = l e n g t h (OUT) ;
Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes . v ) ;
InVals = zeros (1 , sizeY ) ;

26 f o r i = 1: sizeY
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
31 lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
% add i n p u t i f t h i s i s an i n p u t node
for j = 1: sizeIN
36 i f i == IN{ j }
sum = sum + I ( j ) ;
end
end
f o r j = 1 : lenBEF
41 l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
I n V a l s ( i ) = sum ;
46 Y( i ) = si gma { i } ( sum , O( i ) ,G( i ) ) ;
end

OutVals = Y;
end

Then, we use the new evaluation code to calculate the energy.


374 18 Graph Models

Listing 18.64: The updated energy code: energy2.m


f u n c t i o n [ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D,W, O, G, B , BE)
%
% g i s the graph
% Y i s node v e c t o r
5 % W is the link vector
% O is the o f f s e t vector
% G i s the gain vector
% B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
% BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
10 %
% I c o l l e c t i o n of input data
% I i s a matrix S rows , Number o f I n p u t s = c o l s
% IN i n p u t n o d e s
% D c o l l e c t i o n of t a r g e t data
15 % D i s a matrix S rows , Number o f o u t p u t s = c o l s
% OUT o u t p u t n o d e s
%
% YI i s Y v e c t o r f o r e a c h i n p u t
% OI i s O u t p u t v e c t o r f o r e a c h i n p u t
20 % EI i s e r r o r v e c t o r f o r e a c h i n p u t
%
S = length ( I ) ;
Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes ) ;
25 s i z e D = l e n g t h (D) ;
sizeOUT = l e n g t h (OUT) ;
%
YI = z e r o s ( S , s i z e Y ) ;
yI = z e r o s (S , sizeY ) ;
30 OI = z e r o s ( S , sizeOUT ) ;
EI = z e r o s ( S , sizeOUT ) ;

sum = 0 ;
f o r i =1:S
35 [ y I ( i , : ) , YI ( i , : ) ] = e v a l u a t i o n 3 ( sigma , g , I ( i , : ) , IN ,OUT,W, O, G, B , BE) ;
error = 0;
f o r j = 1 : sizeOUT
OI ( i , j ) = YI ( i ,OUT{ j } ) ;
EI ( i , j ) = YI ( i ,OUT{ j } ) − D( i , j ) ;
40 e r r o r = e r r o r + ( EI ( i , j ) ) ˆ 2 ;
end
sum = sum + e r r o r ;
end
E = . 5 ∗ sum ;
45
end

Next, we change the update code. In addition to removing the node function setup
code, we also streamlined the gradient norm calculation.
18.4 Polishing the Training Code 375

Listing 18.65: The new update code: GradientUpdate2.m


f u n c t i o n [ WUpdate , OUpdate , GUpdate , NormGrad , LocMaxGrad ] =
Gra d ie n tU p date2 ( sigma , si g maprime , g , y , Y,W, O, G, B , BE, F , FE , I , IN , D,OUT,
r)
%
% g i s the graph
4 % y i s t h e matrix of i n p u t v e c t o r s f o r each exemplar
% Y i s t h e matrix of output v e c t o r s f o r each exemplar
% W is the link vector
% O is the o f f s e t vector
% G i s the gain vector
9 % FE i s t h e g l o b a l f o r w a r d e d g e s e t i n f o r m a t i o n
% F i s the g l o b a l forward l i n k s e t
%
% I i s the l i s t of input vectors
% IN i s t h e l i s t o f i n p u t n o d e s
14 % D i s the l i s t of t a r g e t vectors
% OUT i s t h e l i s t o f o u t p u t n o d e s
%
% r i s t h e amount o f g r a d i e n t u p d a t e
%
19 % t u r n g a i n u p d a t e s on o r o f f : DOGAIN = 0
% t u r n s them o f f
DOGAIN = 0 ;

Nodes = g . v ;
24 [m, s i z e Y ] = s i z e ( Nodes ) ;
sizeW = l e n g t h (W) ;
S = length ( I ) ;
s i z e D = l e n g t h (D) ;
SN = l e n g t h ( IN ) ;
29 sizeOUT = l e n g t h (OUT) ;
xi = zeros (S , sizeY ) ;
DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;
DEDG = z e r o s ( 1 , s i z e Y ) ;
34
%f i n d maximum o u t p u t node
OMax = OUT{ 1 } ;
f o r i =2: sizeOUT
i f OUT{ i } > OMax
39 OMax = OUT{ i } ;
end
end

f o r alpha = 1:S
44 for i = OMax : − 1 : 1
% get forward edge i n f o r m a t i o n f o r neuron i
FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h (FEF) ;
49 %f i n d o u t i f i i s a t a r g e t node
for j = 1 : sizeOUT
if OUT{ j } == i
k = j;
IsTargetNode = 1 ;
54 end
end
376 18 Graph Models

ifI s T a r g e t N o d e == 1
x i ( a l p h a , i ) = (Y( a l p h a , i ) − D( a l p h a , k ) ) ;
else
59 adder = 0 ;
f o r j = 1 : lenFF
l i n k = FEF( j ) ;
% F ( j ) i s a node w h i c h i g o e s f o r w a r d t o
p o s t = FF ( j ) ;
64 a d d e r = a d d e r + x i ( a l p h a , p o s t ) ∗W( l i n k ) ∗ s i g m a p r i m e { p o s t } ( y (
a l p h a , p o s t ) ,O( p o s t ) ,G( p o s t ) ) ;
end
x i ( alpha , i ) = adder ;
end % t e s t on n o d e s : o u t p u t o r n o t
end % l o o p on n o d e s
69 end % l o o p on d a t a
%x i

f o r alpha = 1 : S
f o r i = s i z e Y : −1 : 1
74 % FE{ i } i s t h e g l o b a l node i n f o r m a t i o n
% where i t h e i t h w e i g h t
% FE{ i } = a s e t o f ( p r e , p o s t )
FF = F{ i } ;
FEF = FE{ i } ;
79 lenFF = l e n g t h ( FF ) ;
f o r j = 1 : lenFF
% get the weight index
l i n k = FEF( j ) ;
% W( l i n k ) = E { p r e −> p o s t } w h e r e
84 pre = i ;
p o s t = FF ( j ) ;
DEDW( l i n k ) = DEDW( l i n k ) + x i ( a l p h a , p o s t ) ∗ s i g m a p r i m e { p o s t } ( y ( a l p h a
, p o s t ) ,O( p o s t ) ,G( p o s t ) ) ∗Y( a l p h a , p r e ) ;
end% f o r w a r d l i n k l o o p
DEDO( i ) = DEDO( i ) − x i ( a l p h a , i ) ∗ s i g m a p r i m e { i } ( y ( a l p h a , i ) ,O( i ) ,G( i ) )
;
89 %DEDG( i ) = DEDG( i ) − x i ( a l p h a , i ) ∗ s i g m a p r i m e { i }( y ( a l p h a , i ) ,O( i ) ,G( i )
) ∗( y ( a l p h a , i )−O( i ) ) /G( i ) ;
end% node l o o p
end% e x e m p l a r l o o p

i f DOGAIN == 1
94 Grad = [DEDW DEDO DEDG ] ;
else
Grad = [DEDW DEDO ] ;
end
NormGrad = s q r t ( sum ( Grad . ∗ Grad ) ) ;
99 %LocMaxGrad = g e t a r g m a x ( Grad , . 0 0 1 ) ;
LocMaxGrad = 1 ;
gradtol = 0.05;
%
i f NormGrad < g r a d t o l
104 scale = r ;
else
s c a l e = r /NormGrad ;
end
%C s c a l e = s c a l e
109 % do u p d a t e s
WUpdate = W− DEDW ∗ s c a l e ;
OUpdate = O − DEDO ∗ s c a l e ;
%GUpdate = G − DEDG ∗ s c a l e ;
GUpdate = G;
114
end

Finally, we need to alter the training loop code also.


18.4 Polishing the Training Code 377

Listing 18.66: The altered training loop code: chainffntrain2.m


f u n c t i o n [W, G, O, Energy , Norm , Loc ] = c h a i n f f n t r a i n 2 ( sigma , sigmaprime , g ,
Input , IN , Target ,OUT,W, O, G, B , BE, F , FE , lambda , NumIters )
%
% We t r a i n a c h a i n f f n
%
5 % I n p u t i s i n p u t m a t r i x o f S rows o f i n p u t d a t a
% S x number o f i n p u t n o d e s
% t a r g e t i s d e s i r e d t a r g e t v a l u e m a t r i x o f S rows o f t a r g e t d a t a
% S x number o f o u t p u t n o d e s
%
10 % g i s the chain f f n graph
% G i s the gain parameters
% O i s the o f f s e t parameters
% Y i s t h e node o u t p u t d a t a
%
15 Energy = [ ] ;
[ E , y , Y, OI , EI ] = e n e r g y 2 ( sigma , g , Input , IN ,OUT, Target ,W, O, G, B , BE) ;
%E
Energy = [ Energy E ] ;
f o r t = 1 : NumIters
20 [W, O, G, Norm , Loc ] = G radientUpdate2 ( sigma , sigmaprime , g , y , Y,W, O, G, B ,
BE, F , FE , Input , IN , Target ,OUT, lambda ) ;
[ E , y , Y, OI , EI ] = e n e r g y 2 ( sigma , g , Input , IN ,OUT, Target ,W, O, G, B , BE) ;
Energy = [ Energy E ] ;
end
p l o t ( Energy ) ;
25 %E
end

Let’s look at a sample training session.

Listing 18.67: Setup the training session


% setup t r a i n i n g data
X = linspace (0 ,3.14 ,45) ;
Input = X’ ;
4 U = c o s (X) ;
Target = U’ ;
% s e t u p 1−5−1 MFFN a s a c h a i n FFN
v = [1;2;3;4;5;6;7];
e = {[1;2] ,[1;3] ,[1;4] ,[1;5] ,[1;6] ,[2;7] ,[3;7] ,[4;7] ,[5;7] ,[6;7]};
9 E = edges ( e ) ;
V = vertices (v) ;
G = g r a p h s (V, E) ;
KG = i n c i d e n c e (G) ;
[ NodeSize , E d g e S i z e ] = s i z e (KG) ;
14 % i n i t i a l i z e tu nab le parameters
Y = z e r o s ( 1 , NodeSize ) ;
OL = − 1 . 4 ;
OH = 1 . 4 ;
GL = 0 . 8 ;
19 GH = 1 . 2 ;
WL = − 2 . 1 ;
WH = 2 . 1 ;
O f f s e t = OL+(OH−OL) ∗ rand ( 1 , N o d e S i z e ) ;
Gain = GL+(GH−GL) ∗ rand ( 1 , N o d e S i z e ) ;
24 W = WL+(WH −WL) ∗ rand ( 1 , E d g e S i z e ) ;
% get l i n k information
[ B , F , BE, FE ] = B F s e t s (G,KG) ;
% s e t i n p u t and o u t p u t i n d e x s e t s
IN = { 1 } ;
29 OUT = { 7 } ;
% i n i t i a l i a z e node f u n c t i o n s and d e r i v a t i v e s
[ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( NodeSize , − 1 . 2 , 1 . 2 , IN ,OUT) ;
378 18 Graph Models

Listing 18.68: Do the training


% Get I n i t i a l E n e r g y
[ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , G, Input , IN ,OUT, Target , Y,W, O f f s e t , Gain
, B , BE) ;
E = 18.2366
4 % Start Training
[W, Gain , O f f s e t , Energy , Norm , Loc ] = c h a i n f f n t r a i n 2 ( sigma , sigmaprime , G,
Input , IN , Target ,OUT,W, O f f s e t , Gain , B , BE, F , FE , 0 . 0 0 0 5 , 1 0 0 ) ;
E 16.0724
...
many s t e p s
9 ans = 0 . 1

We can see the results of our approximation in Fig. 18.6 by generating a test file.

Listing 18.69: A Test File


1 X = linspace (0 ,3.14 ,101) ;
I n p u t = X ’ ; U = c o s (X) ;
Target = U’ ; Z = [ ] ;
f o r i =1:101
[ y ,Y] = e v a l u a t i o n 3 ( sigma , G, I n p u t ( i ) , IN ,OUT,W, O f f s e t , Gain , B , BE) ;
6 Z = [ Z Y( 7 ) ] ;
end
p l o t (X, U, X, Z ) ;

Fig. 18.6 The approximation of cos(t) on [0,3.14] using a 1-5-1 FFN graph: error is 0.1
18.5 Comparing the CFFN and MFFN Code 379

18.5 Comparing the CFFN and MFFN Code

Let’s take a moment to check to see if our two ways of building approximations using
network architectures give the same results. To do this, we will setup a particular
1-5-1 network as both a CFFN and a MFFN and calculate the starting energy and
one gradient step to see if they match. This is a bit harder than it sounds as the data
structures are a bit different. First, here is the setup. This code is careful to set up
both versions of the 1-5-1 problem the same. The parameters are chosen randomly
as before and we map the MFFN choices to their CFFN counterparts so that both
techniques start the same. We use simple node computation initialization functions
too. We added one for the MFFN which we show here.

Listing 18.70: Initializing the MFFN node functions


f u n c t i o n [ n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e ] = SigmaMFFNInit ( SL , SH)
2 %
% I n i t i a l i z e s i g m o i d s n e e d e d f o r MFFN
%
nodefunction = {};
nodefunctionprime = {};
7 sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
n o d e f u n c t i o n {1} = si gma ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
% o u p u t n o d e s r a n g e from SL t o SH
12 s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;

s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / 2 ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
17 s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( ( SH−SL ) / 2 ) ∗ s e c h ( ( y − o f f s e t ) /
gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;

22 end

The function SigmoidInit2 does a similar job for the CFFN node function ini-
tializations. Note the code to set the parameters in the CFFN to match the MFFN
ones is a bit cumbersome as it has to been manually since the data structures don’t
match up nicely.
380 18 Graph Models

Listing 18.71: Stetting up the CFFN versus MFFN test


function testcffneqmffn ()
%
X = linspace (0 ,3 ,21) ;
Input = X’ ;
5 U = s i n (X) . ˆ 2 ;
Target = U’ ;
%s e t u p t h e 1−5−1 MFFN
LayerSizes = [ 1 ; 5 ; 1 ] ;
OL = − 1 . 4 ;
10 OH = 1 . 4 ;
GL = 0 . 8 ;
GH = 1 . 2 ;
TL = − 2 . 1 ;
TH = 2 . 1 ;
15 [ G, O, T ] = m f f n i n i t (GL,GH, OL ,OH, TL , TH, L a y e r S i z e s ) ;
% i n i t i a l i z e t h e n o d e f u n c t i o n s f o r t h e MFFN w h i c h
% a r e o f f b y a g a i n f a c t o r from t h e o n e s u s e d i n t h e CFFN
[ n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e ] = SigmaMFFNInit ( − 0 . 5 , 1 . 5 ) ;
%now i n i t i a l i z e t h e n o d e f u n c t i o n s f o r t h e c f f n
20 IN={1};
OUT={7};
[ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( 7 , − 0 . 5 , 1 . 5 , IN ,OUT) ;
%s e t u p 1−5−1 CFFN
v = [1;2;3;4;5;6;7];
25 %t h i s s e t s up Tˆ1 1 = E1 , Tˆ1 2 = E2 , Tˆ1 3 = E3
% T ˆ1 4 = E4 , Tˆ1 5 = E5 ,
% T ˆ2 1 = E6 , Tˆ2 2 = E7 , Tˆ2 3 = E8
% T ˆ2 4 = E9 , Tˆ2 5 = E10
e = {[1;2] ,[1;3] ,[1;4] ,[1;5] ,[1;6] ,[2;7] ,[3;7] ,[4;7] ,[5;7] ,[6;7]};
30 E = edges ( e ) ;
V = vertices (v) ;
GG = g r a p h s (V, E) ;
KGG = i n c i d e n c e (GG) ;
[ NodeSize , E d g e S i z e ] = s i z e (KGG) ;
35 % setup parameters
GC = z e r o s ( 1 , N o d e S i z e ) ;
OC = z e r o s ( 1 , N o d e S i z e ) ;
WC = z e r o s ( 1 , E d g e S i z e ) ;
f o r i =1:5
40 WC( i ) = T{1}( i ) ;
WC( i +5) = T{2}( i ) ;
end
GC( 1 ) = G{ 1 } ( 1 ) ;
OC( 1 ) = O{ 1 } ( 1 ) ;
45 for i = 1:5
GC( i +1) = G{ 2 } ( 1 , i ) ;
OC( i +1) = O{ 2 } ( 1 , i ) ;
end
GC( 7 ) = G{ 3 } ( 1 ) ;
50 OC( 7 ) = O{ 3 } ( 1 ) ;
%g e t l i n k i n f o r m a t i o n
[ B , F , BE, FE ] = B F s e t s (GG,KGG) ;
%g e t mffn e n e r g y
[ y , Y, RE,EMFFN] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s
);
55 EMFFN
%g e t c f f n e n e r g y
[ ECFFN, yCFFNI , YCFFNI , OCFFNI , ECFFNI ] = e n e r g y 2 ( sigma ,GG, Input , IN ,OUT,
Target ,WC, OC,GC, B, BE) ;
ECFFN

60 end

We finish this initialization code by doing the energy evaluations for both versions.
We show this result below. Note they are the same value so it seems the two energy
evaluations return the same number.
18.5 Comparing the CFFN and MFFN Code 381

Listing 18.72: The initial energy for the CFFN and MFFN code
EMFFN =
2.9173
ECFFN =
2.9173

Next, we test one gradient update. To do this, we created new update and train-
ing functions for both the CFFN and the MFFN which simply add diagnostic
material. These are the functions chainffntraindiagnostic, Gradient
Updatediagnostic, mffntraindiagnostic and mffnupdatediagno
stic. If you look at them, you can see the changes we made are pretty straight-
forward; we leave that exploration to you! The testing and the results are shown
next.

Listing 18.73: One gradient step for both CFFN and MFFN
%t r a i n mffn 1 s t e p
[ G, O, T ,EMFFN] = m f f n t r a i n d i a g n o s t i c ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, 0 . 0 0 0 5 , 1 ) ;
%T r a i n c f f n 1 s t e p
[WC,GC, OC, ECFFN, Norm , Loc ] = c h a i n f f n t r a i n d i a g n o s t i c ( sigma , sigmaprime ,
GG, Input , IN , Target ,OUT,WC, OC,GC, B , BE, F , FE , 0 . 0 0 0 5 , 1 ) ;
5 GradMFFN =
−1.3319 0.6797 −0.2617 0.1119 −0.4563 2.7344
4.2227 0.5844 0.0905 1.5443 0.6065
2.0345 −1.2520 0.2508 −0.3006 0.6916 −4.8874
NormGradMFFN =
7.8205
10 GradCFFN =
−1.3319 0.6795 −0.2618 0.1118 −0.4566 2.7349
4.2220 0.5844 0.0905 1.5442 0.6078
2.0354 −1.2515 0.2509 −0.3005 0.6919 −4.8868
NormGradCFFN =
7.8202

18.6 Handling Feedback

The evaluation algorithm as shown in Table 17.1 must be interpreted carefully in


the case of feedback connections. If the backward set B(i) contains feedback, then
how should one do the evaluation? An example will make this clear. Take a stan-
dard Folded Feedback Pathway cortical circuit model from Raizada and Grossberg
(2003) as shown in Fig. 18.7a. This is the usual biological circuit model and the nodes
are labeled backwards for our purposes—as we have mentioned before. We relabel
them in with the nodes starting at N1 in the redone figure on the left. There are 11
nodes here and 11 edges with E11→10 , E10→9 and E10→9 being explicit feedback.
Let’s assume external input comes into N8 from the thalamus and into N11 from
the cortical column above it. The input for the nodal calculation for N7 in the graph
in Fig. 18.7b is then
382 18 Graph Models

(a) (b) N9
E9→10 N 10

N6 N7 N8
E3→6 E5→8
E4→7
N3 N4 N5
E10→11
E2→7
E2→3 E1→7
E2→4 E2→5

N2 E11→2
E1→2 N 11

N1

Fig. 18.7 A Folded Feedback Pathway cortical circuit in two forms: the relabeled graph on the
right has external Input into Node 1 from the thalamus and into Node 9 from the cortical column
above. a The Folded Feedback Pathway DG. b The Relabeled Folded Feedback Pathway DG

y7new = E8→7 Y8old + E9→7 Y9old

which in terms of a global clock would be implemented as

y7 (t + 1) = E8→7 Y8 (t) + E9→7 Y9 (t)

where we can initialize the nodal outputs a variety of ways: for example, by calculat-
ing all the outputs without feedback initially. and hence, setting Y9 (0) = Y10 (0) = 0.
When we relabel this graph as shown in Fig. 18.7b, it is now clear this a feed
forward chain of computational nodes and the only feedback is the relabeled edge
E11→2 . The external input comes into N1 from the thalamus and into N9 from the
cortical column above it. The input for the nodal calculation for N2 is in terms of
time ticks on our clock

y2new = E1→2 Y1old + E11→2 Y11


old

or

y2 (t + 1) = E1→2 Y1 (t) + E11→2 Y11 (t)

However we choose to implement the graph, there will be feedback terms and so
we must interpret the evaluation and update equations appropriately. Then, once a
graph structure is chosen, we apply the usual update equations in the form given by
Table 18.1a and the Hebbian updates using Table 18.1b. In this version of a Hebbian
update algorithm, we differ from the one we discussed earlier in this chapter. In the
18.6 Handling Feedback 383

Table 18.1 Feedback evaluation and Hebbian update algorithms


(a) Feedback Evaluation (b) Feedback Hebbian Update
for(i = 0; i < N ; i + +) { for(i = 0; i < N ; i + +) {
if (i ∈ U) for (j ∈ B(i)) {
y i (t + 1) = xi (t) + j∈B(i) Ej→i (t)fj (t) yp (t) = fi (t) Ej→i (t)
else if (yp (t) )
y i (t + 1) = j∈B(i) Ej→i (t)fj (t) Ej→i (t) = ζ Ej→i (t)
fi (t + 1) = σ i (y i (t + 1), p) }
} }

I2 I4
I2 I2
O1 O2 O3
N9 N 20
17 18
Y Y Y 19
N6 N7 N 8 E9→10 N 17 N 18 N 19 E9→10

E3→6 E5→8 N 10 E3→6 E5→8 N 21


E4→7 E4→7
E2→7 E10→11 E10→11
N3 N4 N5 N 14 N 15 E2→7N 16
N 11 N 22
E2→3 E2→4
E2→3E2→4
E1→7 E2→5 E1→7
E11→2 E2→5
N2
N 13
E1→2
E1→2
I1 N1 I3 N 12
I1 I1

Fig. 18.8 The Folded Feedback Pathway as a lagged architecture

first version, we checked to see if the value of the post neuron is high and the value of
the edge Epre→post is also high, the value of this edge is increased by a multiplier ξ. To
check this, we just looked at the value of Y post × Epre→post and if it was high enough
we increased the weight Epre→post . Here, the idea is similar but this time we look at
the value of Y pre × Epre→post and if this values is high enough, we increase the edge.
There are other versions too. The idea is to somehow increase the edge weight when
the pre signal and the post signal are both high.
Another approach is to handle feedback terms using a lag as shown in Fig. 18.8.
We make two copies of the graph which are here drawn using nodes N1 through N11
on the left and these correspond to what is happening at time t. On the right is the copy
of the graph with the nodes labeled N12 to N22 which corresponds to time t + 1. The
feedback connection is then the link between N11 and N13 with edge weight E11→2 ;
hence the feedback has now become a feedforward link in this new configuration.
The nodes N12 and N22 retain the usual edge weights from time t as you can see
in the figure. The number of tunable parameters is still the same as in the original
384 18 Graph Models

I2 I4
I2 I2
O1 O2 O3
N 10 N 22
E9→10 N 21 E9→10
N9 E7→9 E8→9
E8→9 E E10→11
E7→9 10→11
E6→9
E6→9
N6 N7 N8 N 18 N 19 N 20

E3→6 E5→8 N 11 E3→6 E4→7 N 23


E4→7 E5→8
E2→7 E11→12
N3 N4 N5 N 15 N 16 E2→7 N 17 E11→12
N 12
E2→3 E2→4
E2→3E2→4
E1→7 E2→5 E1→7 N 24
E12→2 E2→5
N2
N 14
E1→2
E1→2
I1 N1 I3 N 13
I1 I1

Fig. 18.9 Adding column to column feedback to the original Folded Feedback Pathway as a lagged
architecture

network with feedback but using this doubling procedure, we have converted the
network into a feedforward one. Once the graph is initialized, we apply evaluation
and update algorithms modified a bit to take into account this new doubled structure.
Note here there are inputs into Nodes N1 and N12 of value I1 and inputs into Nodes
N9 and N20 of value I2 . The outputs here are N17 , N18 and N19 rather than N6 , N7 and
N8 . For a given lagged problem, we need to decide how to handle this doubling of
the inputs and outputs.; we simply show the possibilities here. Now Fig. 18.8 shows
a portion of a cortical circuit where the OCOS subcircuit appears disconnected from
the FFP portion. Let’s make it a bit more realistic by adding a node N9 in the column
above this column circuit which is connected to N6 , N7 and N8 . The output from N9
then connects to the FFP originally given by nodes N9 , N10 and N11 . We will shift
the FFP nodes up by one making them N10 , N11 and N12 . We draw this in Fig. 18.9.
In this new architecture, the calculations through the copy that corresponds to time t
are generating the N12 output and the second copy of the network is necessary to
take into account the feedback from N12 to N2 .
Let’s look at some MatLab implementations of these ideas. First, we need to
redo how we parse the incidence matrix of a graph so that we find self and non self
feedback terms. Consider the new incidence code below. There are now two new
possibilities we focus on. For a given node ii, we set up the entries in its columns.
We loop through the edges using the index jj. Recall each edge consists of a in and
out pair. The term e2(1,jj) gives the in value of the edge. If in value matches
the given node ii, we check to see if the in value ii is less than the out value. If this
is true, this means this edge goes out from the node ii and so we set the incidence
18.6 Handling Feedback 385

matrix value IncidMat(ii,jj) = 1. However, if the reverse inequality holds,


we know this means the in value is bigger than or equal to the out value. This means
we have feedback, and so we set IncidMat(ii,jj) = 2. The tests for incoming
edges is similar: we are in the case where the node ii is now the out value. if the in
value is smaller than ii, we set IncidMat(ii,jj) = -1 as this is an incoming
edge. If the inequality goes the other way, this is an incoming feedback edge.

Listing 18.74: The incidence matrix calculation for a CFFN with feedback
f u n c t i o n I n c i d M a t = i n c i d e n c e F B (G)
%
% g i s a graph having
% vertices g . v
5 % edges g . e
%
% Get t h e e d g e o b j e c t from g
E = G. e ;
% g e t t h e e d g e s from t h e e d g e o b j e c t
10 e2 = E . e ;
% g e t t h e v e r t i c e s o b j e c t from g
V = G. v ;
% g e t t h e v e r t i c e s from t h e v e r t i c e s o b j e c t
v2 = V. v ;
15 % f i n d o u t how many v e r t i c e s and e d g e s t h e r e are
[m, s i z e V ] = s i z e ( v2 ) ;
[m, s i z e E ] = s i z e ( e2 ) ;
IncidMat = z e r o s ( sizeV , s i z e E ) ;
%
20 % setup incidence matrix
%
f o r i i = 1: sizeV
for j j = 1: sizeE
i f e2 ( 1 , j j ) == i i
25 i f e2 ( 1 , j j ) < e2 ( 2 , j j )
IncidMat ( i i , j j ) = 1 ;
else
IncidMat ( i i , j j ) = 2 ;
end
30 e l s e i f e2 ( 2 , j j ) == i i
i f e2 ( 1 , j j ) < e2 ( 2 , j j )
I n c i d M a t ( i i , j j ) = −1;
else
I n c i d M a t ( i i , j j ) = −2;
35 end
end
end
end
end
40 end

Once we have the new incidence matrix, we need to convert this into a dot file for
printing. If we have self feedback, the sum of the entries in a given column will be
positive and so we set the to and from nodes of that edge to be the same. If the we
have a +1 and a −1 in the column for a given edge, this corresponds to our usual
forward edge. If we have a +2 and a −2 in the same column, this is a feedback edge.
386 18 Graph Models

Listing 18.75: Converting the incidence matrix for a CFFN with feedback into a dot
file
f u n c t i o n incToDotFB ( i n c , width , h e i g h t , a s p e c t , f i l e n a m e )
[ N, E ] = s i z e ( i n c ) ;
[ f i d , msg ] = f o p e n ( f i l e n a m e , ’w ’ ) ;
f p r i n t f ( f i d , ’ digraph G {\ n ’ ) ;
5 f p r i n t f ( f i d , ’ size =\"% f ,% f \";\ n ’ , width , h e i g h t ) ;
f p r i n t f ( f i d , ’ rankdir = TB ;\ n ’ ) ;
f p r i n t f ( f i d , ’ ratio =% f ;\ n ’ , a s p e c t ) ;
f o r i i = 1 :N
f p r i n t f ( f i d , ’% d [ shape =\" rectangle \" fontsize =20 fontname =\"
Times new Roman \" label =\"% d \"];\ n ’ , i i , i i ) ;
10 end

f p r i n t f ( f i d , ’\ n ’ ) ;

for j j = 1:E
15 a = 0;
b = 0;
f o r i i = 1 :N
switch inc ( i i , j j )
case 1
20 a = ii ;
c a s e −1
b = ii ;
case 2
a = ii ;
25 i f a b s ( sum ( i n c ( : , j j ) ) ) > 0
b = ii ;
end
c a s e −2
b = ii ;
30 end
end
i f ( a ˜= 0 ) && ( b ˜= 0 )
f p r i n t f ( f i d , ’%d - >% d ;\ n ’ , a , b ) ;
end
35 end
f p r i n t f ( f i d , ’\ n } ’ ) ;
fclose ( fid ) ;
end

Let’s see how this works on a modified OCOS that has some feedback and self
feedback.

Listing 18.76: Testing the CFFN feedback incidence matrix


V= [ 1 ; 2 ; 3 ; 4 ; 5 ; 6 ; 7 ; 8 ] ;
2 E = {[1;2] ,[2;3] ,[2;4] ,[2;5] ,[3;6] ,[4;7] ,[5;8] ,[2;7] ,[1;7] ,...
[8;4] ,[6;6]};
v= v e r t i c e s (V) ;
e = e d g e s (E) ;
G = graphs ( v , e ) ;
7 KG = i n c i d e n c e F B (G) ;
KOCOS =
1 0 0 0 0 0 0 0 1 0 0
−1 1 1 1 0 0 0 1 0 0 0
0 −1 0 0 1 0 0 0 0 0 0
12 0 0 −1 0 0 1 0 0 0 −2 0
0 0 0 −1 0 0 1 0 0 0 0
0 0 0 0 −1 0 0 0 0 0 2
0 0 0 0 0 −1 0 −1 −1 0 0
0 0 0 0 0 0 −1 0 0 2 0
17 incToDotFB (KG, 6 , 6 , 1 , ’ OC O S w i t h F B . dot ’ ) ;
18.6 Handling Feedback 387

Fig. 18.10 A modified


OCOS with self feedback
and feedback

This generates the graphic we see in Fig. 18.10. As an exercise, let’s setup the
OCOS/FFP with column to column interaction we have shown as a lagged FFN
in code. We start by constructing the original graph of this circuit with feedback.

Listing 18.77: Constructing the OCOS/FFP with column to column feedback


V = [1;2;3;4;5;6;7;8;9;10;11;12];
E = {[1;2] ,[2;3] ,[2;4] ,[2;5] ,[3;6] ,[4;7] ,[5;8] ,[2;7] ,[1;7] ,...
3 [6;9] ,[7;9] ,[8;9] ,[9;10] ,[10;11] ,[11;12] ,[12;2]};
v = v e r t i c e s (V) ;
e = e d g e s (E) ;
G = graphs ( v , e ) ;
KG = i n c i d e n c e F B (G) ;
8 incToDotFG (KG, 1 0 , 1 0 , 1 , ’ O C O S F F P a n d F B . dot ’ ) ;

We generate the plot as usual which is shown in Fig. 18.11. Note the incidence
matrix tells us where the feedback edges are: KG(12, 16) = 2 and KG(2, 16) = −2.
We need to extract the feedback edges and build the new graph which is two copies of
the old one. There is only one feedback edge here, so we find FBedge(1) = 16.

Listing 18.78: Finding the feedback edges


%g e t t h e e d g e s o f t h e g r a p h
eG = G. e ;
[ rows , c o l s ] = s i z e (KG) ;
%f i n d t h e f e e d b a c k e d g e s
5 FBedge = [ ] ;
f o r i = 1 : rows
for j = 1: cols
i f KG( i , j ) == 2
%e d g e j i s feedback
10 FBedge = [ FBedge , j ] ;
end
end
endFBedge ( 1 )
ans =
15 16
388 18 Graph Models

Fig. 18.11 The plot of the


OCOS/FFP with column
feedback graph

We now know the feedback edge is 16 and so we can remove it. We had to
write remove edge functions to do this. We show them next. There is the func-
tion subtract.m in the @edges directory that removes an edge and then we
use this function in the @graphs directory to remove an edge from the graph with
subtractedge function. We show these functions next.

Listing 18.79: The edges object subtract function


f u n c t i o n W = s u b t r a c t (E , p )
%
% e d g e p i s removed from e d g e l i s t
%
5 n = l e n g t h (E . e ) ;
out = {};
f o r i =1:n
i f ( i ˜=p )
temp = { [ E . e ( i ) . i n ; E . e ( i ) . o u t ] } ;
10 o u t = [ out , temp ] ;
end
end
W = edges ( out ) ;
end

Listing 18.80: The graphs object subtract function


1 function W = subtractedge (g , p)
%
% remove an e d g e from an e x i s t i n g graph
%
E = subtract (g . e , p) ;
6 W = g r a p h s ( g . v , E) ;
end
18.6 Handling Feedback 389

We remove edge 16 and then copy the resulting graphs. We have not automated this
yet and we know the new edge goes from 12 to 14 so we add this new edge to the
double copy.

Listing 18.81: Subtract the feedback edge and construct the double copy
G = s u b t r a c t e d g e (G, FBedge ( 1 ) ) ;
G. e
3 ans =
1 2 2 2 3 4 5 2 1 6 7 8 9 10 11
2 3 4 5 6 7 8 7 7 9 9 9 10 11 12
W = addgraph (G, G, { [ 1 2 ; 1 4 ] } ) ;
W. v
8 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
W. e
1 2 2 2 3 4 5 2 1 6 7 8 9
2 3 4 5 6 7 8 7 7 9 9 9 10
13
10 11 13 14 14 14 15 16 17 14 13
11 12 14 15 16 17 18 19 20 19 19

18 19 20 21 22 23 12
18 21 21 21 22 23 24 14
KW = i n c i d e n c e F B (W) ;
incToDot (KW, 1 0 , 1 0 , 1 , ’ D o u b l e d O C O S F F P . dot ’ ) ;

The new dot file generates Fig. 18.12. We see it is fairly easy to construct the doubled
graph although it is clear a lot of work is required to automate this process. Also,

Fig. 18.12 The plot of the


doubled OCOS/FFP with
column feedback turned into
a feedforward edge
390 18 Graph Models

we will apply the standard evaluation, update and training algorithms for the CFFN
to this network and they need to be modified. First, only edges 1–15 and the old
feedback edge which is now edge 31 correspond to tunable weight values. Also the
offsets and gains of only nodes 1 through 12 are modifiable. Since we usually don’t
tune the gains, this network has 16 tunable edges and 12 tunable offsets for a total of
28 parameters. We will explore these architectures in the next volume which will be
devoted to building realistic models of cognitive dysfunction. Just for fun, let’s list
what we would need to do:
• Suppose the number of original nodes in the graph G is N and the edges for G
consist of M feedforward edges and Q feedback edges.
• We double G to the new graph W which has 2N nodes and 2P edges. We then add
the Q feedback edges as new feedforward edges from the original graph G to its
copy as you have seen us do in the example.
• We have to make a decision about the input and output nodes. If IN and OUT
are those sets, a reasonable thing to do is add the input nodes in the origi-
nal part of the doubled graph to the copy; i.e. NewIN = {IN, IN+N} and
to drop the output nodes in the original part and use those nodes in the copy;
i.e. NewOUT = OUT + N which adds N to each index in IN and OUT as
appropriate.
• The nodes N + 1 to 2N use the same node function as the original nodes so we
have to set them up that way.
• The only tunable parameters are the values for the edges 1 to P from the original
or the values from P + 1 to 2P in the copy and the new feedforward edges we
created 2P + 1 to 2P + Q and the offsets and gains for the original nodes 1 to N
or the offsets and gains in the copy. It makes more sense to do the updates in the
copy as we know there is a big problem with attenuation in the size of the partial
derivatives the farther we are removed from the raw target error. Hence, we will
update the link weights for P + 1 to 2P and the offsets and gains in the copy as
well. These updated values will then be used to set the parameter values in the
original graph part of the lagged graph.
∂E
• we adjust the update code so that we only calculate ∂E{j} for the appropriate j and
∂E ∂E
the ∂O(k) and ∂G(k) for the right k.
• We can then train as usual.

18.7 Lagged Training

We have added a new function removeedges to both @edges and @graphs.


These are nice functions that make it easier for us to setup the lagged graphs.
18.7 Lagged Training 391

Listing 18.82: Removing a list of edges from an edges object


f u n c t i o n W = removeedges (E , L i s t )
%
% t h e e d g e s i n L i s t i s removed from e d g e s object
%
5 N = length ( List ) ;
%s o r t L i s t
SL = s o r t ( L i s t ) ;
%remove l a s t e d g e
W = s u b t r a c t ( E , SL (N) ) ;
10 %remove r e m a i n i n g e d g e s
f o r p = N− 1 : − 1 : 1
W = s u b t r a c t (W, SL ( p ) ) ;
end
end

Then we use this function to remove edges from a graphs object.

Listing 18.83: Removing a list of edges from a graph


1 f u n c t i o n W = removeedges ( g , L i s t )
%
% remove l i s t o f e d g e s from an e x i s t i n g graph
%
E = removeedges ( g . e , L i s t ) ;
6 W = g r a p h s ( g . v , E) ;
end

Once we have removed edges, we want to find and keep the list of edges removed
and the edge information itself. We do this with the functions ExtractFBEdges
and ExtractNewFFEdgeInfo. The function ExtractFBEdges stores the link
numbers associated with the feedback edges we find in the original graph.

Listing 18.84: Extracting the feedback edge indices from the incidence matrix
f u n c t i o n FBedge = ExtractFBEdges (KG)
%
% KG i s i n c o m i n g i n c i d e n c e m a t r i x
%
5 %
[ rows , c o l s ] = s i z e (KG) ;
FBedge = [ ] ;
f o r i = 1 : rows
for j = 1: cols
10 i f KG( i , j ) == 2
%e d g e j i s feedback
FBedge = [ FBedge , j ] ;
end
end
15 end

end

Each of the feedback edges has an integer pair of the form [a, b] where a > b because
it is a feedback edge. The function below sets up the new feedforward edges we need
in the lagged graph to have the form [a, b + N] where N is the number of nodes in
the original graph.
392 18 Graph Models

Listing 18.85: Extracting the new feedforward edges constructed from the old feed-
back ones
f u n c t i o n NewFFedges = ExtractNewFFEdgeInfo (G, L i s t )
%
3 % G i s o r i g i n a l graph with feedback
% L i s t i s l i s t o f FB e d g e s
%
NewFFedges = { } ;
M = length ( List ) ;
8 N = l e n g t h (G. v ) ;
ge = G. e ;
f o r i = 1 :M
NewFFedges{ i } = [ ge ( 1 , L i s t ( i ) ) ; ge ( 2 , L i s t ( i ) )+N ] ;
end
13
end

We have a new gradient update algorithm GradientUpdateLag and a new train-


ing algorithm chainffntrainlag.m. We are going to implement gradient scal-
ing here which takes the form
⎡ ⎤
λ1 0 0 ... 0
⎢0 λ2 0 ... 0 ⎥
⎢ ⎥
∇ scaled E = ⎢ . .. .. .. .. ⎥ ∇E
⎣ .. . . . . ⎦
0 . . . 0 . . . λN

where we set the multiplier λi = 1 is the absolute value of the ith component of the
gradient is too small (say less than 0.01) and otherwise λi = 1/λmax where λmax is
the largest absolute value of the components of the gradient. This works reasonably
well as long as the gradient has components which are reasonably uniform in scale.
If the gradient has only a few components that are big, this will still result in a scaled
gradient that only updates a few components. We can make some general comments
about the code which are in the source, but here we have taken them out and put them
in the main body. First, this is a lagged graph so there are 2 copies of the original
graph giving us a lagged graph with nodes 1 to N from the original and nodes N + 1
to 2N from the copy. The node functions of the first N nodes are repeated for the
second set of nodes: so input nodes are in IN and IN+N. The output nodes are in
OUT and OUT+N and we use those output nodes to set the node functions for the
lagged graph. For training purposes, we do use the double input, IN, IN+N but the
outputs are just the nodes in the copy, OUT + N.
If the number of edges in the original graph was M we know M = P + Q, where P
were feedforward edges and Q were feedback edges which are redone as feedforward
edges in the lagged graph. The parameters W from P + 1 to 2P, and W from 2P + 1
to 2P + Q are updateable as well as the gains and offsets from N + 1 to 2N. So the
original code for the gradient updates must be altered to accommodate the fact that
half of the parameters of the lagged network are not to be updated. The ξ calculations
are the same as we want to include all the contributions from all nodes and links.
Note as usual we start the ξ loops at OMax, the largest node in the OUT set as the
nodes after that do not contribute to the energy. Next, we have the structure of W is
18.7 Lagged Training 393

W = [link weights for the old graph without feedback,


link weights for the copy of the graph without feedback,
link weights for all feedback edges]

which is the same as

W = [W1 , . . . , WP , W1 , . . . , WP , W2P+1 , . . . , W2P+Q ]

In this code NGE is P and NFB is Q. We generate all partials and then set some of
them to zero like this. We use DEDW to be the vector of partials with respect to link
weights, DEDO to be partials for the offsets and DEDG, the partials for the gains.

DEDW = [set to 0 partials for 1 to NGE,


keep partials for NGE +1 to 2∗ NGE,
keep partials for 2∗ NGE +1 to 2∗ NGE + NFB]
DEDO = [set to 0 partials for 1 to NGN,
keep partials for NGN +1 to 2∗ NGN]
DEDG = [set to 0 partials for 1 to NGN,
keep partials for NGN +1 to 2∗ NGN]

Of course, it is more complicated if we use gradient scaling, but the idea is essentially
the same. The last part of the code implements gradient descent using either the raw
gradient information or the scaled gradients.

Listing 18.86: New Lagged Gradient Update Code


f u n c t i o n [ WUpdate , OUpdate , GUpdate , Grad , ScaledGrad , NormGrad ] = . . .
GradientUpdateLag ( sigma , sigmaprime , g , y , Y,W, O, G, B , BE, F , FE , I , IN , D,OUT, r
,NGN,NGE,NFB)
%
% g i s the l a g g e d graph
5 % y i s t h e matrix of i n p u t v e c t o r s f o r each exemplar
% Y i s t h e matrix of output v e c t o r s f o r each exemplar
% W is the link vector
% O is the o f f s e t vector
% G i s the gain vector
10 % FE i s t h e g l o b a l f o r w a r d e d g e s e t i n f o r m a t i o n
% F i s the g l o b a l forward l i n k s e t
%
% I i s the l i s t of input vectors
% IN i s t h e l i s t o f i n p u t n o d e s
15 % D i s the l i s t of t a r g e t vectors
% OUT i s t h e l i s t o f o u t p u t n o d e s
%
% r i s t h e amount o f g r a d i e n t u p d a t e
% NGN i s t h e s i z e o f t h e o r i g i n a l g r a p h n o d e s
20 % NGE i s t h e s i z e o f t h e o r i g i n a l g r a p h FF e d g e s
% NFB i s t h e number o f f e e d b a c k e d g e s
%
% t u r n g a i n u p d a t e s on o r o f f : DOGAIN = 0
% t u r n s them o f f
394 18 Graph Models

25 DOGAIN = 0 ;

Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes ) ;
sizeW = l e n g t h (W) ;
30 S = length ( I ) ;
s i z e D = l e n g t h (D) ;
SN = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
xi = zeros (S , sizeY ) ;
35 DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;
DEDG = z e r o s ( 1 , s i z e Y ) ;

%f i n d maximum o u t p u t node
40 OMax = OUT{ 1 } ;
f o r i =2: sizeOUT
i f OUT{ i } > OMax
OMax = OUT{ i } ;
end
45 end
%We s t a r t t h e x i l o o p s a t OMax, t h e l a r g e s t node i n t h e OUT s e t
%a s t h e n o d e s a f t e r t h a t do n o t c o n t r i b u t e t o t h e e n e r g y .
f o r alpha = 1 : S
f o r i = OMax : − 1 : 1
50 % g e t forward edge i n f o r m a t i o n f o r neuron i
FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h (FEF) ;
%f i n d o u t i f i i s a t a r g e t node
55 f o r j = 1 : sizeOUT
i f OUT{ j } == i
k = j;
IsTargetNode = 1 ;
end
60 end
i f I s T a r g e t N o d e == 1
x i ( a l p h a , i ) = (Y( a l p h a , i ) − D( a l p h a , k ) ) ;
else
adder = 0 ;
65 f o r j = 1 : lenFF
l i n k = FEF( j ) ;
% F ( j ) i s a node w h i c h i g o e s f o r w a r d t o
p o s t = FF ( j ) ;
a d d e r = a d d e r + x i ( a l p h a , p o s t ) ∗W( l i n k ) ∗ s i g m a p r i m e { p o s t } ( y (
a l p h a , p o s t ) ,O( p o s t ) ,G( p o s t ) ) ;
70 end
x i ( alpha , i ) = adder ;
end % t e s t on n o d e s : o u t p u t o r n o t
end % l o o p on n o d e s
end % l o o p on d a t a
75 %x i

f o r alpha = 1 : S
f o r i = OMax : −1 : 1
% FE{ i } i s t h e g l o b a l node i n f o r m a t i o n
80 % where i t h e i t h w e i g h t
% FE{ i } = a s e t o f ( p r e , p o s t )
FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h ( FF ) ;
85 f o r j = 1 : lenFF
% get the weight index
l i n k = FEF( j ) ;
% W( l i n k ) = E { p r e −> p o s t } w h e r e
pre = i ;
90 p o s t = FF ( j ) ;
DEDW( l i n k ) = DEDW( l i n k ) + x i ( a l p h a , p o s t ) ∗ s i g m a p r i m e { p o s t } ( y ( a l p h a
, p o s t ) ,O( p o s t ) ,G( p o s t ) ) ∗Y( a l p h a , p r e ) ;
end% f o r w a r d l i n k l o o p
18.7 Lagged Training 395

DEDO( i ) = DEDO( i ) − x i ( a l p h a , i ) ∗ s i g m a p r i m e { i } ( y ( a l p h a , i ) ,O( i ) ,G( i ) )


;
DEDG( i ) = DEDG( i ) − x i ( a l p h a , i ) ∗ s i g m a p r i m e { i } ( y ( a l p h a , i ) ,O( i ) ,G( i ) )
∗ ( y ( a l p h a , i )−O( i ) ) /G( i ) ;
95 end% node l o o p
end% e x e m p l a r l o o p

%we h a v e f o u n d a l l the gradients


%now s e t t h e r i g h t pieces to 0
100 DEDW( 1 , 1 :NGE) = 0 ;
DEDO( 1 , 1 :NGN) = 0 ;
DEDG( 1 , 1 :NGN) = 0 ;

%u s u a l g r a d i e n t
105 i f DOGAIN == 1
Grad = [DEDW DEDO DEDG ] ;
else
Grad = [DEDW DEDO ] ;
end
110 NormGrad = s q r t ( sum ( Grad . ∗ Grad ) ) ;
%
%we i m p l e m e n t g r a d i e n t s c a l i n g
Scale Tol = 0.01;
WScale = o n e s ( 1 , l e n g t h (W) ) ;
115 O S c a l e = o n e s ( 1 , l e n g t h (O) ) ;
GS c a le = o n e s ( 1 , l e n g t h (G) ) ;
ADEDW = max ( a b s (DEDW) ) ;
ADEDO = max ( a b s (DEDO) ) ;
ADEDG = max ( a b s (DEDG) ) ;
120 f o r i = 1 : l e n g t h (W)
i f a b s (DEDW( i ) ) > S c a l e T o l
WScale ( i ) = 1/ADEDW;
end
end
125 f o r i = 1 : l e n g t h (O)
i f a b s (DEDO( i ) ) > S c a l e T o l
O S c a l e ( i ) = 1/ADEDO;
end
end
130 f o r i = 1 : l e n g t h (G)
i f a b s (DEDG( i ) ) > S c a l e T o l
GS ca le ( i ) = 1/ADEDG;
end
end
135 i f DOGAIN == 1
SW = d i a g ( WScale ) ∗DEDW’ ;
SO = d i a g ( O S c a l e ) ∗DEDO’ ;
SG = d i a g ( GScale ) ∗DEDG’ ;
S c a l e d G r a d = [SW’ SO ’ SG ’ ] ;
140 else
SW = d i a g ( WScale ) ∗DEDW’ ;
SO = d i a g ( O S c a l e ) ∗DEDO’ ;
S c a l e d G r a d = [SW’ SO ’ ] ;
end
145 gradtol = 0.05;
%
i f NormGrad < g r a d t o l
scale = r ;
%u p d a t e a l l t h e l i n k s .
150 WUpdate = W− DEDW ∗ s c a l e ;
%s e t c o p i e s t o u p d a t e s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;

%u p d a t e a l l o f f s e t s . This does not a l t e r


155 %o f f s e t s 1 t o NGN
OUpdate = O − DEDO ∗ s c a l e ;
OUpdate ( 1 , 1 :NGN) = OUpdate ( 1 ,NGN+1:2∗NGN) ;
i f DOGAIN == 1
%u p d a t e a l l g a i n s . This does not a l t e r
396 18 Graph Models

160 %g a i n s 1 t o NGN
GUpdate = G − DEDG ∗ s c a l e ;
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
165 end
e l s e% normgrad i s l a r g e r t h a n 0 . 0 5
NormScaledGrad = s q r t ( sum ( S c a l e d G r a d . ∗ S c a l e d G r a d ) ) ;
%u p d a t e a l l t h e l i n k s .
WUpdate = W− r ∗DEDW∗ d i a g ( WScale ) ;
170 %s e t c o p i e s t o u p d a t e s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;

%u p d a t e a l l o f f s e t s . This does not a l t e r


%o f f s e t s 1 t o NGN
175 OUpdate = O − r ∗DEDO∗ d i a g ( O S c a l e ) ;
OUpdate ( 1 , 1 :NGN) = OUpdate ( 1 ,NGN+1:2∗NGN) ;

i f DOGAIN == 1
%u p d a t e a l l g a i n s . This does not a l t e r
180 %g a i n s 1 t o NGN
GUpdate = G − r ∗DEDG∗ d i a g ( E S c a l e ) ;
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
185 end
end

end

Finally, we put it all together to write a simple training loop.

Listing 18.87: New lagged training code


f u n c t i o n [W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
2 c h a i n f f n t r a i n l a g ( sigma , sigmaprime , g , Input , IN , Target , . . .
OUT,W, O, G, B , BE, F , FE , lambda , NumIters ,NGN,NGE,NFB)
%
% We t r a i n a c h a i n f f n
%
7 % I n p u t i s i n p u t m a t r i x o f S rows o f i n p u t d a t a
% S x number o f i n p u t n o d e s
% t a r g e t i s d e s i r e d t a r g e t v a l u e m a t r i x o f S rows o f t a r g e t d a t a
% S x number o f o u t p u t n o d e s
%
12 % g i s the chain f f n graph
% G i s the gain parameters
% O i s the o f f s e t parameters
% Y i s t h e node o u t p u t d a t a
%
17 Energy = [ ] ;
[ E , y , Y, OI , EI ] = e n e r g y 2 ( sigma , g , Input , IN ,OUT, Target ,W, O, G, B , BE) ;
E
Energy = [ Energy E ] ;
f o r t = 1 : NumIters
22 [W, O, G, Grad , ScaledGrad , Norm ] = GradientUpdateLag ( sigma , sigmaprime , g
, y , Y,W, O, G, B , BE, F , FE , Input , IN , Target ,OUT, lambda ,NGN,NGE,NFB) ;
[ E , y , Y, OI , EI ] = e n e r g y 2 ( sigma , g , Input , IN ,OUT, Target ,W, O, G, B , BE) ;
Energy = [ Energy E ] ;
end
Energy ( NumIters )
27 p l o t ( Energy ) ;
end

We need to test this code and see how it does. First, we setup the lagged graph.
18.7 Lagged Training 397

Listing 18.88: Example: Setting up the original graph with feedback


V = [1;2;3;4;5;6;7;8;9;10;11;12];
2 E = {[1;2] ,[2;3] ,[2;4] ,[2;5] ,[3;6] ,[4;7] ,[5;8] ,[2;7] ,[1;7] ,...
[6;9] ,[7;9] ,[8;9] ,[9;10] ,[10;11] ,[11;12] ,[12;2] ,...
[11 ,3] ,[7;7]};
v = v e r t i c e s (V) ;
e = e d g e s (E) ;
7 G = graphs ( v , e ) ;
KG = i n c i d e n c e F B (G) ;
eG = G. e ;
incToDotFB (KG, 1 0 , 1 0 , 1 , ’ Pra c t i c e F B . dot ’ ) ;

Note, we generate the original dot file so we can see the original graph. Then we get
the information we need to build the lagged graph.

Listing 18.89: Example: Setting up the lagged graph


%F i n d t h e f e e d b a c k e d g e s and s u b t r a c t them
FBedge = ExtractFBEdges (KG) ;
NewFFedges = ExtractNewFFEdgeInfo (G, FBedge ) ;
% subtract the feedback edges
5 W = r e m o v e e d g e s (G, FBedge ) ;
%d o u b l e t h e g r a p h
LaggedW = addgraph (W,W, NewFFedges ) ;
KLaggedW = i n c i d e n c e ( LaggedW ) ;
incToDot ( KLaggedW , 1 0 , 1 0 , 1 , ’ N e w D o u b l e d F F G r a p h . dot ’ ) ;

We then do some further setup. Note we first set OUT = {6,7,8,9,18,


19,20,21 } as we need to set all the node functions correctly. Later, we will
reset to OUT = {18,19,20,21}.

Listing 18.90: Example: Further setup details and input and target sets
IN = { 1 , 1 3 } ;
OUT = { 6 , 7 , 8 , 9 , 1 8 , 1 9 , 2 0 , 2 1 } ;
NGN = l e n g t h (G. v ) ;
NGE = l e n g t h (G. e ) − l e n g t h ( FBedge ) ;
5 NFB = l e n g t h ( FBedge ) ;
N o d e S i z e = l e n g t h ( LaggedW . v ) ;
SL = − 0 . 5 ;
SH = 1 . 5 ;
%s e t u p t r a i n i n g
10 X = linspace (0 ,1 ,101) ;
U = [];
for i = 1:101
i f X( i ) >= 0 && X( i ) < 0 . 2 5
U = [U, [ 1 ; 0 ; 0 ; 1 ] ] ;
15 e l s e i f X( i ) >= 0 . 2 5 && X( i ) < 0 . 5 0
U = [U, [ 0 ; 1 ; 0 ; 0 ] ] ;
e l s e i f X( i ) >= 0 . 5 && X( i ) < 0 . 7 5
U = [U, [ 0 ; 0 ; 1 ; 1 ] ] ;
else
20 U = [U, [ 0 ; 0 ; 1 ; 0 ] ] ;
end
end
%r a n d o m l y p e r m u t e t h e o r d e r o f t h e i n p u t and t a r g e t set .
P = randperm ( 1 0 1 ) ;
398 18 Graph Models

25 XX = X(P) ;
UU = [ ] ;
for i = 1:101
j = P( i ) ;
UU = [UU,U ( : , j ) ] ;
30 end
I n p u t = [XX’ , XX ’ ] ;
T a r g e t = UU’ ;
%s e t a l l node f u n c t i o n s a s u s u a l ; t h e c o p i e s w i l l
%b e t h e same a s t h e o r i g i n a l
35 [ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( NodeSize , SL , SH , IN ,OUT) ;

We designed this input and output data with a more biological situation in mind. The
inputs are scalars and so are perhaps a light intensity value that has been obtained
from additional layers of neural circuitry. The outputs are taken from the top of the
OCOS/FFP stack and so are reminiscent of outputs from the top layer of visual cortex
(albeit in a very simplistic way) and the binary patterns of the target—i.e. [1; 0; 0; 1]
might represent a coded command to active a certain motor response or a set of
chromatophore activations resulting in a specific pattern on the skin of a cephalopod.
So this circuit, if we can train it to match the inputs to outputs, is setup to code for
four specific motor or chromatophore activations. The feedback edges 11 → 3 and
7 → 7 are not really biological and were added just to help provide a nice test for the
code. However, the edge 12 → 2 is part of the usual FFP circuit and so is biologically
plausible. The top of the OCOS was allowed to feedforward into a summation node 9
which then fed up to node 10 in the cortical column above the one which contains our
OCOS/FFP. That node 10 then provides feedback from the top cortical column into
the bottom of our OCOS/FFP stack. So this example is a small step in the direction
of modeling something of interest in cognitive modeling!

Listing 18.91: Example: parameters and backward and forward sets


%r e s e t OUT
OUT = { 1 8 , 1 9 , 2 0 , 2 1 } ;
%r e s e t node s i z e t o t h a t o f G
N o d e S i z e = l e n g t h (G. v ) ;
5 OL = − 1 . 4 ;
OH = 1 . 4 ;
GL = 0 . 8 ;
GH = 1 . 2 ;
WL = − 2 . 1 ;
10 WH = 2 . 1 ;
O f f s e t = OL+(OH−OL) ∗ rand ( 1 , N o d e S i z e ) ;
O = [ Offset , Offset ] ;
Gain = GL+(GH−GL) ∗ rand ( 1 , N o d e S i z e ) ;
G = [ Gain , Gain ] ;
15 Weights1 = WL+(WH −WL) ∗ rand ( 1 ,NGE) ;
Weights2 = WL+(WH−WL) ∗ rand ( 1 ,NFB) ;
W = [ Weights1 , Weights1 , Weights2 ] ;
[ BG, FG, BEG, FEG ] = B F s e t s ( LaggedW , KLaggedW ) ;

We can do a simple evaluation.


18.7 Lagged Training 399

Listing 18.92: Example: A simple evaluation and energy calculation


[ I n V a l s , OutVals ] = e v a l u a t i o n 3 ( sigma , LaggedW , I n p u t ( 1 , : ) , IN ,OUT,W, O, G,
BG,BEG) ;
2 %f i n d e r r o r
[ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,
BEG) ;
E =
87.3197

Finally, we can do some training.

Listing 18.93: Example: 10 training steps


[W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , Target ,OUT , . . .
W, O, G, BG, BEG, FG, FEG, 0 . 0 0 1 , 1 0 ,NGN,NGE,NFB) ;

We list the regular gradient and the scaled gradient. Here there are 15 original feed-
forward edges and the first 15 values in Grad are therefore 0. The next 15 correspond
to the link weights for the copy and the last 3 are the link weights for the rewritten
feedback edges. The values for 28, 29 and 30 are 0 because those edges are after the
last output node and so are irrelevant to the energy calculation. Then, there are the
3 partials for the feedback link weights. Finally, there are then 12 offset partials that
are zero as they are in the original graph and 9 offsets partials from the copy that are
nonzero with the last 3 zero as they are irrelevant. The listing below that is the scaled
version of the gradient.

Listing 18.94: Example: 10 training steps gradient information


Grad
2 Grad =
Columns 1 t h r o u g h 15
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0
Columns 16 t h r o u g h 30
−3.2022 0.8050 0.8867 0.4867 − 1 7.41 2 6 2.2200
−5.1833 2.6164 −0.3360 −0.3775 3.6818 −3.4156
0 0 0
7 Columns 31 t h r o u g h 45
4.0001 1.1957 1.6520 0 0 0
0 0 0 0 0 0 0
0 0
Columns 46 t h r o u g h 57
−103.4473 4.2735 −6.7604 − 14.8478 − 12.6224 18.4797
−3.7118 8.8448 −7.7773 0 0 0

12 ScaledGrad =
Columns 1 t h r o u g h 15
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0
Columns 16 t h r o u g h 30
−0.1839 0.0462 0.0509 0.0280 −1.0000 0.1275
−0.2977 0.1503 −0.0193 −0.0217 0.2114 −0.1962
0 0 0
400 18 Graph Models

17 Columns 31 t h r o u g h 45
0.2297 0.0687 0.0949 0 0 0
0 0 0 0 0 0 0
0 0
Columns 46 t h r o u g h 57
−1.0000 0.0413 −0.0654 −0.1435 −0.1220 0.1786
−0.0359 0.0855 −0.0752 0 0 0

Finally, let’s talk about thresholding and recognition. A given target here has binary
values in it and the output from the model is not going to look like that. For example,
after some training, we might have a model output of [0.85; −0.23; 0.37; 0.69] with
the target supposed to be [1; , 0; , 0; 1]. To see if the model has recognized or classified
this input, we usually take the model output and apply a threshold function. We set low
and high threshold values; say cL = 0.4 and cH = 0.6 and perform the test Vi = 1 if
Yi > cH , Vi = 0 if Yi < cL and otherwise Vi = 0.5 where V represents the thresholded
output of the model. Then if the error between the thresholded value V and the target
is 0, we can say we have classified this input correctly. We do this in code with a pair of
functions. First, we write the code to do the thresholding.

Listing 18.95: The thresholding algorithm


f u n c t i o n R = R e c o g n i t i o n (X, LoTol , HiTol )
%
R = z e r o s ( l e n g t h (X) , 1 ) ;
f o r i = 1 : l e n g t h (X)
5 if X( i ) > HiTol
R( i ) = 1 ;
e l s e i f X( i ) < LoTol
R( i ) = 0 ;
else
10 R( i ) = 0 . 5 ;
end
end
end

Then, we want to run the thresholding algorithm over all our results. This is done in
GetRecognition.

Listing 18.96: Finding out how many inputs have been classified
f u n c t i o n [ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = G e t R e c o g n i t i o n (OUT, Y,
Target , LoTol , HiTol )
%
%
%
5 [ rows , c o l s ] = s i z e (Y) ;
R = z e r o s ( rows , l e n g t h (OUT) ) ;
Z = z e r o s ( rows , l e n g t h (OUT) ) ;
E r r o r = z e r o s ( rows , l e n g t h (OUT) ) ;
S u c c e s s = z e r o s ( rows , 1 ) ;
10 k = 1;
f o r j = 1 : l e n g t h (OUT)
for i = 1: cols
i f i == OUT{ j }
%we h a v e a column t h a t i s an o u t p u t
15 Z ( : , k ) = Y( : , i ) ;
k = k +1;
break ;
end
end
18.7 Lagged Training 401

20 end
%Z now c o n t a i n s t h e c o l s t h a t match
% targets
f o r i = 1 : rows
R( i , : ) = R e c o g n i t i o n ( Z ( i , : ) , LoTol , HiTol ) ;
25 E r r o r ( i , : ) = (R( i , : ) − T a r g e t ( i , : ) ) . ˆ 2 ;
t e s t = sum ( E r r o r ( i , : ) ) ;
i f t e s t == 0
Success ( i ) = 1;
end
30 end
sumE = 0 . 5 ∗ ( norm ( E r r o r , ’ fro ’ ) ) ˆ 2 ;
C l a s s i f i e d = sum ( S u c c e s s ) / rows ∗ 1 0 0 ;
end

For example, after suitable training, we might have results like this

Listing 18.97: Sample classification results


Z(97:101 ,:)
2
ans =

−0.0095 0.1987 0.6900 0.3421


0.9875 −0.1334 −0.0370 0.9726
7 −0.0095 0.3317 0.6900 0.3205
1.0231 0.0014 −0.0630 0.9737
1.0222 −0.0296 −0.0623 0.9746

Target ( 9 7 : 1 0 1 , : )
12
ans =

0 0 1 1
1 0 0 1
17 0 0 1 1
1 0 0 1
1 0 0 1

Recall Z here is the four outputs we are monitoring. We see entry 97 maps to [0; 0; 1; 0]
which does not match Target 97 but the output 98 does map to the correct output.

18.8 Better Lagged Training!

It doesn’t take long before we all get annoyed with gradient descent. We have played a
bit with making it better by adding a type of gradient scaling, but we can do one more
trick. Let’s look at a technique called line search. We have a function of many variables
here called E. For convenience, let all the parameters be called the vector p. Any vector
D for which the derivative of E in the direction of the vector D, ∇E · D < 0 is a direction
in which the value of E goes down is called a descent vector. Our various strategies for
402 18 Graph Models

pickingvectorsbasedon∇E(pold )allgiveusdescentvectors.Lookataslicethroughthe
multidimensionalsurfaceofEgivenbyg(ξ) = E(pold + ξD)whereDneednotbeaunit
vector. At a given step in our gradient descent algorithm, we know g(0) = E(pold ) and
if λ is our current choice of step to try, we can calculate g(λ) = E(pold + λD). From the
chainruleofcalculusofmorethanonevariable—ha,youwillhavetorememberwhatwe
discussed in Peterson (2015)—we find g  (ξ) =< ∇E(pold + ξD), D >. Hence, since
we know the gradient at the current step, we see g  (0) =< ∇E(pold ), D >. We thus
have enough information to fit a quadratic approximation of g(ξ) given by g approx (ξ) =
A + Bξ + Cξ 2 . We easily find

A = E(pold )
B = − < ∇E(pold ), D >
E(pold + λD) − E(pold ) − B λ
C=
λ2
Thenewversionofthecodehasbeenreorganizedabitandisadropinreplacementfor
the old code. It has a few toggles you can set in the code: if you set dolinesearch = 0
and leave dogradscaling = 1, you get the old version of the gradient update code
for the lagged networks.

Listing 18.98: Activating Line Search and Gradient Scaling


dogradscaling = 1;
dolinesearch = 1;

The structure of the code has been reorganized as we show in the skeleton here.

Listing 18.99: New GradientUpdateLag skeleton code


if d o l i n e s e a r c h == 1
% g e t t h e f i r s t e n e r g y v a l u e we w i l l n e e d t o b u i l d t h e q u a d r a t i c
3 % a p p r o x i m a t i o n t o t h e 1D s u r f a c e s l i c e u s i n g t h e d e s c e n t v e c t o r
[ E S t a r t , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D,W, O, G, B , BE) ;
end
% we n e e d t h e o r i g i n a l g r a d i e n t
i f DOGAIN == 1
8 BaseGrad = [DEDW DEDO DEDG ] ;
else
BaseGrad = [DEDW DEDO ] ;
end
NormBase = s q r t ( sum ( BaseGrad . ∗ BaseGrad ) ) ;
13
if d o g r a d s c a l i n g == 1
% implement g r a d i e n t s c a l i n g
% c a l c u l a t e the gradient vector using gradient
% s c a l i n g and c a l l i t S c a l e d G r a d
18.8 Better Lagged Training! 403

18 % s e t UseGrad
UseGrad = S c a l e d G r a d ;
else
% without gradient scaling ,
% we u s e t h e o r i g i n a l g r a d i e n t
23 UseGrad = Grad ;
end%d o g r a d s c a l i n g l o o p

%now g e t norm o f g r a d i e n t
gradtol = 0.05;
28 NormGrad = s q r t ( sum ( UseGrad . ∗ UseGrad ) ) ;

%now do d e s c e n t s t e p
i f d o g r a d s c a l i n g == 1
% do a d e s c e n t s t e p u s i n g l a m b d a = s c a l e
33 % using e it h er the o r i g i n a l descent vector
% or i t s n o r m a l i z a t i o n
i f NormGrad < g r a d t o l
%n o t n o r m a l i z e d
NormDescent = NormGrad ;
38 e l s e% normgrad i s l a r g e r t h a n 0 . 0 5
%n o r m a l i z e d
NormDescent = 1 ;
end
e l s e% n o t u s i n g g r a d s c a l i n g
43 %
S c a l e d G r a d = Grad ;
i f NormGrad < g r a d t o l
%n o t n o r m a l i z e d
NormDescent = NormGrad ;
48 e l s e%
% normalized
NormDescent = 1 ;
end
end
53
%now we can do a l i n e s e a r c h
i f d o l i n e s e a r c h == 1
dolinesearchstep = 1;
%f i n d t h e e n e r g y f o r t h e f u l l s t e p o f l a m b d a = s c a l e
58 [ E F u l l , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D, WUpdate , OUpdate ,
GUpdate , B , BE) ;
%s e t A, BCheck = B and C a s d i s c u s s e d i n t h e t h e o r y o f t h e
quadratic approximation
%o f t h e s l i c e
A = EStart ;
BCheck = −d o t ( BaseGrad , UseGrad ) ;
63 C = ( E F u l l − A − BCheck∗ s c a l e ) / ( s c a l e ˆ 2 ) ;
%g e t t h e o p t i m a l s t e p
l a m b d a s t a r = −BCheck / ( 2 ∗C) ;
% i f C i s n e g a t i v e t h i s c o r r e s p o n d s t o a maximum s o r e j e c t
i f (C<0 | | l a m b d a s t a r > s c a l e )
68 % we a r e g o i n g t o a maximum on t h e l i n e s e a r c h ; r e j e c t
l a m b d a s t a r = l a m b d a s t a r +2∗ s c a l e ;
%d o l i n e s e a r c h s t e p = 0 ;
end
%now we h a v e t h e o p t i m a l l a m b d a t o u s e
73 i f d o l i n e s e a r c h s t e p == 1
%do new s t e p u s i n g l a m d a s t a r
s c a l e = lambdastar ;
%do i t d i f f e r e n t l y i f d o i n g g r a d i e n t s c a l i n g o r n o t
...
78 end%d o l i n s e a r c h s t e p l o o p
end% d o l i n e s e a r c h l o o p

%c o p y u p d a t e s t o t h e o r i g i n a l b l o c k o f p a r a m e t e r s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;
83 OUpdate ( 1 , 1 :NGN) = OUpdate ( 1 ,NGN+1:2∗NGN) ;
404 18 Graph Models

i f DOGAIN == 1
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
88 end

We can try out our new version of GradientUpdateLag which includes line
search and see how it does. Note it still takes quite a while, but without line search the
iteration count can reach 50,000 or more and you reach about 50 % recognition and
stall. The example before used the odd feedbacks of 11 -> 3 and 7 -> 7 which we will
now remove for this second example. Also, we remove the output node 10 as it is not
very biological. Other than that, the setup is roughly the same, so we only show the
construction of the original graph in the results below. We start with a ten step run using
gradient scaling and line search. We print the optimal line search step size to show you
how it is going. Then we do another and also print the initial recognition results.

Listing 18.100: A Sample Run


V = [1;2;3;4;5;6;7;8;9;10;11;12];
E = {[1;2] ,[2;3] ,[2;4] ,[2;5] ,[3;6] ,[4;7] ,[5;8] ,[2;7] ,[1;7] ,...
[6;9] ,[7;9] ,[8;9] ,[9;10] ,[10;11] ,[11;12] ,[12;2]};
v = v e r t i c e s (V) ;
5 e = e d g e s (E) ;
G = graphs ( v , e ) ;
KG = i n c i d e n c e F B (G) ;
%c o n s t r u c t l a g g e d g r a p h a s b e f o r e
%s e t t h e IN and OUT s o we can i n i t i a l i z e t h e node f u n c t i o n s
10 IN = { 1 , 1 3 } ;
OUT = { 6 , 7 , 8 , 1 8 , 1 9 , 2 0 } ;
% remove t h e e x t r a o u p u t v a l u e
X = linspace (0 ,1 ,101) ;
U = [];
15 for i = 1:101
i f X( i ) >= 0 && X( i ) < 0 . 2 5
U = [U, [ 1 ; 0 ; 0 ] ] ;
e l s e i f X( i ) >= 0 . 2 5 && X( i ) < 0 . 5 0
U = [U, [ 0 ; 1 ; 0 ] ] ;
20 else
U = [U, [ 0 ; 0 ; 1 ] ] ;
end
end
%I n i t i a l energy
25 [ E , yI , YI , OI , EI ] = . . .
e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,BEG) ;
E =
149.276
%T r a i n f o r 10 s t e p s w i t h l a m b d a = . 1
30 [W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG , 0 . 0 0 1 , . . .
, 1 0 ,NGN,NGE,NFB) ;
lambdastar = 0.0572
35 lambdastar = 0.0585
18.8 Better Lagged Training! 405

lambdastar = 0.0600
lambdastar = 0.0620
lambdastar = 0.0645
lambdastar = 0.0678
40 lambdastar = 0.0724
lambdastar = 0.0790
lambdastar = 0.0891
lambdastar = 0.1057
E s t a r t = 1 4 9 . 2 7 6 Es to p 1 2 8 . 1 8 3 0
45
%a n o t h e r 100 s t e p s
[W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG , 0 . 1 , . . .
50 1 0 0 ,NGN,NGE,NFB) ;
E s t a r t = 1 2 1 . 7 9 2 1 Es top = 2 0 . 4 4 1 2

%a n o t h e r 100 s t e p s : e n e r e g y o s c i l l a t e s some
[W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
55 c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG , 0 . 1 , . . .
1 0 0 ,NGN,NGE,NFB) ;
Estart = 20.4315 Es to p = 2 0 . 1 7 9 1

60 %a n o t h e r 100 s t e p s : e n e r g y o s c i l l a t e s some
%c u t s c a l e t o . 0 1
[W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG , 0 . 0 1 , . . .
65 1 0 0 ,NGN,NGE,NFB) ;
Estart = 20.1533 Es to p = 2 0 . 0 6 5 2

%c h e c k r e c o g n i t i o n
[ E , yI , YI , OI , EI ] = . . .
70 e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,BEG) ;
[ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = G e t R e c o g n i t i o n (OUT, YI , Target
,0.4 ,0.6) ;
C l a s s i f i e d , sumE
C l a s s i f i e d = 39.6040
sumE = 2 1 . 4 0 6 2

Now run for an additional 11,000 steps after which we check recognition and we see
we are about 50 %.

Listing 18.101: Our example run continued


1 C l a s s i f i e d , sumE
Classified =
49.5050
sumE =
18.1875

This is clearly not very good. We are using excessive iteration and still not getting good
recognition. Let’s revisit the gradient update algorithm and line search and see if we
can improve it.
406 18 Graph Models

18.9 Improved Gradient Descent

Let’s tackle the training code by making it more modular. We are going to use the
functions GradientUpdateLagTwo to handle the gradient stuff and chainffn
trainlagtwo for the iterative training. We rewrote the update code as seen below;
some portions are left out as they are identical. We set up the gradient vectors based on
whether we are doing gradient scaling or not with a nice block.

Listing 18.102: Set up gradient vectors


ScaleTol = 0.01;
i f d o g r a d s c a l i n g == 1
[ WScale , OScale , GScale , UseGrad ] = GetScaledGrad (DOGAIN, S c a l e T o l ,W, O,
G,DEDW,DEDO,DEDG) ;
else
5 UseGrad = BaseGrad ;
end
NormGrad = s q r t ( sum ( UseGrad . ∗ UseGrad ) ) ;

We use the new function GetScaledGrad to find the scaled gradient vector. This
function contains all the code we used before, but by pulling it out we make the gradient
update code easier to follow and debug.

Listing 18.103: Finding the scaled gradient


f u n c t i o n [ WScale , OScale , GScale , UseGrad ] = GetScaledGrad (DOGAIN,
S c a l e T o l ,W, O, G,DEDW,DEDO,DEDG)
%
WScale = o n e s ( 1 , l e n g t h (W) ) ;
O S c a l e = o n e s ( 1 , l e n g t h (O) ) ;
5 GS c a le = o n e s ( 1 , l e n g t h (G) ) ;
ADEDW = max ( a b s (DEDW) ) ;
ADEDO = max ( a b s (DEDO) ) ;
ADEDG = max ( a b s (DEDG) ) ;
f o r i = 1 : l e n g t h (W)
10 i f a b s (DEDW( i ) ) > S c a l e T o l
WScale ( i ) = 1/ADEDW;
end
end
f o r i = 1 : l e n g t h (O)
15 i f a b s (DEDO( i ) ) > S c a l e T o l
O S c a l e ( i ) = 1/ADEDO;
end
end
f o r i = 1 : l e n g t h (G)
20 i f a b s (DEDG( i ) ) > S c a l e T o l
GS ca le ( i ) = 1/ADEDG;
end
end
i f DOGAIN == 1
25 SW = d i a g ( WScale ) ∗DEDW’ ;
SO = d i a g ( O S c a l e ) ∗DEDO’ ;
SG = d i a g ( GScale ) ∗DEDG’ ;
S c a l e d G r a d = [SW’ SO ’ SG ’ ] ;
else
18.9 Improved Gradient Descent 407

30 SW = d i a g ( WScale ) ∗DEDW’ ;
SO = d i a g ( O S c a l e ) ∗DEDO’ ;
S c a l e d G r a d = [SW’ SO ’ ] ;
end
UseGrad = S c a l e d G r a d ;
35
end

Once we have found the appropriate gradient, we must find the updates. If we don’t
do a line search, we simply do a usual descent step using our descent vector. We have
placed the descent code into two new function DoDescentStepGradScaling
which handles the scaled gradient case and DoDescentStepRegular which uses
just a straight gradient without scaling.

Listing 18.104: Doing a descent step


gradtol = 0.05;
%now do d e s c e n t s t e p
i f d o g r a d s c a l i n g == 1
[ WUpdate , OUpdate , GUpdate ] = D o D e s c e n t S t e p G r a d S c a l i n g (DOGAIN,
NormBase , NormGrad ,W, O, G , . . .
5 WScale , OScale , GScale ,DEDW,DEDO,DEDG, g r a d t o l , r ) ;
else
[ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
D o D e s c e n t S t e p R e g u l a r (DOGAIN, NormGrad ,W, O, G,DEDW,DEDO,DEDG, g r a d t o l ,
r) ;
end

Itismucheasiertoseethestructureofthecodenowasallofthecomplicatedcomputations
are placed in their own modules. These modules are lifted straight from the old update
code. First, look at the scaled descent.

Listing 18.105: The Scaled Gradient Descent Step


1 f u n c t i o n [ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
D o D e s c e n t S t e p G r a d S c a l i n g (DOGAIN, NormBase , NormGrad ,W, O, G, WScale , OScale
, GScale , . . .
DEDW,DEDO,DEDG, g r a d t o l , s c a l e )
%
%
6 i f NormBase < g r a d t o l
WUpdate = W− DEDW ∗ s c a l e ;
OUpdate = O − DEDO ∗ s c a l e ;
i f DOGAIN == 1
GUpdate = G − DEDG ∗ s c a l e ;
11 else
GUpdate = G;
end
NormDescent = NormBase ;
e l s e% NormBase i s l a r g e r t h a n 0 . 0 5
408 18 Graph Models

16 WUpdate = W− s c a l e ∗DEDW∗ d i a g ( WScale ) /NormGrad ;


OUpdate = O − s c a l e ∗DEDO∗ d i a g ( O S c a l e ) ;
i f DOGAIN == 1
GUpdate = G − s c a l e ∗DEDG∗ d i a g ( E S c a l e ) ;
else
21 GUpdate = G;
end
NormDescent = NormGrad ;
end

26 end

Note the regular descent is much simpler. Before all of this mess was in the main code
block which was making the gradient update code hard to read.

Listing 18.106: The Regular Gradient Descent Step


f u n c t i o n [ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
D o D e s c e n t S t e p R e g u l a r (DOGAIN, NormGrad ,W, O, G,DEDW,DEDO,DEDG, g r a d t o l ,
scale )
%
4 i f NormGrad < g r a d t o l
WUpdate = W− DEDW ∗ s c a l e ;
OUpdate = O − DEDO ∗ s c a l e ;
i f DOGAIN == 1
GUpdate = G − DEDG ∗ s c a l e ;
9 else
GUpdate = G;
end
NormDescent = NormGrad ;
e l s e% normgrad i s l a r g e r t h a n 0 . 0 5
14 WUpdate = W− s c a l e ∗DEDW/NormGrad ;
OUpdate = O − s c a l e ∗DEDO/NormGrad ;
i f DOGAIN == 1
GUpdate = G − s c a l e ∗DEDG/NormGrad ;
else
19 GUpdate = G;
end
NormDescent = 1 ;
end

24 end

Finally, we want to tackle the line search code. The previous version did not take
advantage of what we did at a current line search step and instead kept resetting the
initial step size. So for example, we could run 2000 steps and each one would start line
search at whatever λ value we used as the start. This meant we did not take advantage of
what we were learning. So now we will return two things from our update code when
we use line search. First, the current value of lambdastar we find and second, a
18.9 Improved Gradient Descent 409

minimum value of the line search value. If we used the current lambdastar we find
to reset the next iteration, we could progressively shrink our calculated lambdastar
to zero and effectively stop our progress. So we return a good estimate of how small
we want our new start of the line search to be in the variable lambdastart. The
new line search looks the same in principle. The only difference is we set the value of
lambdastart at the end using the value of lambdamin. This will prevent our line
search from using too small of search values.

Listing 18.107: The new update code


f u n c t i o n [ WUpdate , OUpdate , GUpdate , UseGrad , NormBase , NormGrad ,
lambdastar , lambdastart ] = . . .
GradientUpdateLag ( sigma , sigmaprime , g , y , Y,W, O, G, B , BE, F , FE , I , IN , D,OUT, r
,NGN,NGE,NFB)
%
% gain update t o g g l e , grad s c a l i n g t o g g l e , l i n e s e a r c h t o g g l e
5 DOGAIN = 0 ;
dogradscaling = 1;
dolinesearch = 1;

%s e t up w e i g h t and d e r i v a t i v e vectors
10 Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes ) ;
sizeW = l e n g t h (W) ;
S = length ( I ) ;
s i z e D = l e n g t h (D) ;
15 SN = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
xi = zeros (S , sizeY ) ;
DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;
20 DEDG = z e r o s ( 1 , s i z e Y ) ;

% f i n d the xi ’ s
. . . . code here

25 %f i n d t h e p a r t i a l s
. . . . . code here

%we h a v e f o u n d a l l the gradients


%now s e t t h e r i g h t pieces to 0
30 DEDW( 1 , 1 :NGE) = 0 ;
DEDO( 1 , 1 :NGN) = 0 ;
DEDG( 1 , 1 :NGN) = 0 ;

%g e t b a s e g r a d i e n t
35 i f DOGAIN == 1
BaseGrad = [DEDW DEDO DEDG ] ;
else
BaseGrad = [DEDW DEDO ] ;
end
40 NormBase = s q r t ( sum ( BaseGrad . ∗ BaseGrad ) ) ;
410 18 Graph Models

ScaleTol = 0.01;
i f d o g r a d s c a l i n g == 1
[ WScale , OScale , GScale , UseGrad ] = GetScaledGrad (DOGAIN, S c a l e T o l ,W, O,
G,DEDW,DEDO,DEDG) ;
45 else
UseGrad = BaseGrad ;
end
NormGrad = s q r t ( sum ( UseGrad . ∗ UseGrad ) ) ;

50 gradtol = 0.05;
%now do d e s c e n t s t e p
i f d o g r a d s c a l i n g == 1
[ WUpdate , OUpdate , GUpdate ] = D o D e s c e n t S t e p G r a d S c a l i n g (DOGAIN,
NormBase , NormGrad ,W, O, G , . . .
WScale , OScale , GScale ,DEDW,DEDO,DEDG, g r a d t o l , r ) ;
55 else
[ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
D o D e s c e n t S t e p R e g u l a r (DOGAIN, NormGrad ,W, O, G,DEDW,DEDO,DEDG, g r a d t o l ,
r) ;
end

60 %l i n e s e a r c h
lambdamin = 0 . 5 ;
i f d o l i n e s e a r c h == 1
[ E S t a r t , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D,W, O, G, B , BE) ;
[ E F u l l , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D, WUpdate , OUpdate
, GUpdate , B , BE) ;
65 A = EStart ;
B = −d o t ( BaseGrad , UseGrad ) ;
C = ( E F u l l − A − B∗ r ) / ( r ˆ 2 ) ;
l a m b d a s t a r = −B/ ( 2 ∗C) ;
i f (C<0)
70 % we a r e g o i n g t o a maximum on t h e l i n e s e a r c h ;
l a m b d a s t a r = l a m b d a s t a r +2∗ r ;
end
%do new o p t i m a l s t e p
i f d o g r a d s c a l i n g == 1
75 [ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
D o D e s c e n t S t e p G r a d S c a l i n g (DOGAIN, NormBase , NormGrad ,W, O, G, WScale ,
OScale , GScale , . . .
DEDW,DEDO,DEDG, g r a d t o l , l a m b d a s t a r ) ;
e l s e% n o t u s i n g g r a d s c a l i n g
[ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
80 D o D e s c e n t S t e p R e g u l a r (DOGAIN, NormGrad ,W, O, G,DEDW,DEDO,DEDG,
gradtol , lambdastar ) ;
end
% don ’ t r e t u r n t o o s m a l l a l a m b d a s t a r
i f l a m b d a s t a r < lambdamin
l a m b d a s t a r t = lambdamin ;
85 else
lambdastart = lambdastar ;
end
else
lambdastar = r ;
90 lambdastart = lambdastar ;
end

%c o p y u p d a t e s t o t h e o r i g i n a l b l o c k o f p a r a m e t e r s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;
95 OUpdate ( 1 , 1 :NGN) = OUpdate ( 1 ,NGN+1:2∗NGN) ;
18.9 Improved Gradient Descent 411

i f DOGAIN == 1
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
100 end

end

The new training code is very similar although some of the return variables are
different. Note we have added code to do a plot of our calculated line search values as
well as the energy using the subplot command in MatLab.

Listing 18.108: The new training code


f u n c t i o n [W, G, O, Energy , Grad , NormGrad , lambda , l a m b d a s t a r t ] = . . .
c h a i n f f n t r a i n l a g t w o ( sigma , sigmaprime , g , Input , IN , Target ,OUT,W, O, G, B , BE
, F , FE , lambda , NumIters ,NGN,NGE,NFB)
%
Energy = [ ] ;
5 [ E , y , Y, OI , EI ] = e n e r g y 2 ( sigma , g , Input , IN ,OUT, Target ,W, O, G, B , BE) ;

scale = [ ] ;
Energy = [ Energy E ] ;
l a m b d a s t a r t = lambda ;
10 f o r t = 1 : NumIters
[W, O, G, Grad , NormBase , NormGrad , lambda , l a m b d a s t a r t ] =
GradientUpdateLagTwo ( sigma , sigmaprime , . . .
g , y , Y,W, O, G, B , BE, F , FE , Input , IN , Target ,OUT, l a m b d a s t a r t ,NGN,NGE,
NFB) ;
[ E , y , Y, OI , EI ] = e n e r g y 2 ( sigma , g , Input , IN ,OUT, Target ,W, O, G, B , BE) ;
Energy = [ Energy E ] ;
15 s c a l e = [ s c a l e lambda ] ;
end
Energy ( 1 ) , Energy ( NumIters )
s u b p l o t ( 2 , 1 , 1 ) , p l o t ( Energy ) ;
subplot (2 ,1 ,2) , plot ( scale ) ;
20 end

Let’s try this out on a model. The model is similar to what we have been testing our code
on. We changed the training data a bit and altered the node function setup some. We’ll
let you compare the examples and find the changes.

Listing 18.109: Another example setup


V = [1;2;3;4;5;6;7;8;9;10;11;12];
E = {[1;2] ,[2;3] ,[2;4] ,[2;5] ,[3;6] ,[4;7] ,[5;8] ,[2;7] ,[1;7] ,...
[6;9] ,[7;9] ,[8;9] ,[9;10] ,[10;11] ,[11;12] ,[12;2]};
v = v e r t i c e s (V) ;
5 e = e d g e s (E) ;
G = graphs ( v , e ) ;
KG = i n c i d e n c e F B (G) ;
eG = G. e ;

10 %F i n d t h e f e e d b a c k e d g e s and s u b t r a c t them
FBedge = ExtractFBEdges (KG) ;
NewFFedges = ExtractNewFFEdgeInfo (G, FBedge ) ;

% subtract the feedback edges


412 18 Graph Models

15 W = r e m o v e e d g e s (G, FBedge ) ;

%d o u b l e t h e g r a p h
LaggedW = addgraph (W,W, NewFFedges ) ;
KLaggedW = i n c i d e n c e ( LaggedW ) ;
20
IN = { 1 , 1 3 } ;
OUT = { 6 , 7 , 8 , 1 8 , 1 9 , 2 0 } ;
NGN = l e n g t h (G. v ) ;
NGE = l e n g t h (G. e ) − l e n g t h ( FBedge ) ;
25 NFB = l e n g t h ( FBedge ) ;
N o d e S i z e = l e n g t h ( LaggedW . v ) ;
SL = − 0 . 2 ;
SH = 1 . 2 ;

30 %s e t u p t r a i n i n g
X = linspace (0 ,3 ,101) ;
U = [];
for i = 1:101
i f X( i ) >= 0 && X( i ) < . 3 3
35 U = [U, [ 1 ; 0 ; 0 ] ] ;
e l s e i f X( i ) >= . 3 3 && X( i ) < . 6 7
U = [U, [ 0 ; 1 ; 0 ] ] ;
else
U = [U, [ 0 ; 0 ; 1 ] ] ;
40 end
end

%r a n d o m l y p e r m u t e t h e order of the i n p u t and t a r g e t set .


P = randperm ( 1 0 1 ) ;
45 XX = X(P) ;
UU = [ ] ;
for i = 1:101
j = P( i ) ;
UU = [UU,U ( : , j ) ] ;
50 end

I n p u t = [XX’ , XX ’ ] ;
T a r g e t = UU’ ;

55 %s e t a l l node f u n c t i o n s a s u s u a l ; t h e c o p i e s w i l l
%b e t h e same a s t h e o r i g i n a l
[ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( NodeSize , SL , SH , IN ,OUT) ;

IN = { 1 , 1 3 } ;
60 OUT = { 1 8 , 1 9 , 2 0 } ;

%r e s e t node s i z e t o t h a t o f G
E d g e S i z e = l e n g t h (G. e ) ;
OL = − 1 . 4 ;
65 OH = 1 . 4 ;
GL = 0 . 8 ;
GH = 1 . 2 ;
WL = − 2 . 1 ;
WH = 2 . 1 ;
70 O f f s e t = OL+(OH−OL) ∗ rand ( 1 ,NGN) ;
O = [ Offset , Offset ] ;
Gain = GL+(GH−GL) ∗ rand ( 1 ,NGN) ;
G = [ Gain , Gain ] ;
Weights = WL+(WH−WL) ∗ rand ( 1 , E d g e S i z e ) ;
18.9 Improved Gradient Descent 413

75 W = [ Weights , Weights ] ;
[ BG, FG, BEG, FEG ] = B F s e t s ( LaggedW , KLaggedW ) ;
[ I n V a l s , OutVals ] = e v a l u a t i o n 3 ( sigma , LaggedW , I n p u t ( 1 , : ) , IN ,OUT,W, O, G,
BG,BEG) ;
%f i n d e r r o r
[ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,
BEG) ;
80 E

We start the training and get the following results.

Listing 18.110: Starting the training


%i n i t i a l energy
E = 74.4614
%s e t s t a r t i n g l a m b d a v a l u e
rstart = 0.5;
5 %t r a i n f o r 200 s t e p s
[W, G, O, Energy , Grad , NormGrad , r , r s t a r t ] = . . .
c h a i n f f n t r a i n l a g t w o ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG, r s t a r t , 2 0 0 ,NGN,NGE,NFB) ;
E s t a r t = 7 4 . 4 6 1 4 Estop = 5 . 8 9 5 1
10
%s i g n i f i c a n t p r o g e s s : c h e c k r e c o g n i t i o n
[ E , yI , YI , OI , EI ] = . . .
e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,BEG) ;
[ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = . . .
15 G e t R e c o g n i t i o n (OUT, YI , Target , 0 . 4 , 0 . 6 ) ;
C l a s s i f i e d , sumE
C l a s s i f i e d = 85.1485
sumE = 4 . 0 3 1 2

Only 200 steps and 85 % recognition!! We are doing much better with the new line
search code.

Listing 18.111: Some more training


%650 more s t e p s
2 [ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,
BEG) ;
[ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = G e t R e c o g n i t i o n (OUT, YI , Target
,0.4 ,0.6) ;
C l a s s i f i e d , sumE
C l a s s i f i e d = 91.0891
sumE = 3 . 7 1 8 7
7
%200 more s t e p s
[ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,
BEG) ;
[ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = G e t R e c o g n i t i o n (OUT, YI , Target
,0.4 ,0.6) ;
C l a s s i f i e d , sumE
12 C l a s s i f i e d = 91.0891
sumE = 3 . 6 8 7 5

We might be able to get to 100 % but this is enough to show you how the training
works. Finally, to see how our model works on new data, consider the following test.
We set up new input and corresponding target data. These inputs are created using a
414 18 Graph Models

different linspace command so the inputs will not be the same as before. You can
see recognition is still fine.

Listing 18.112: Testing the model


%s e t u p t e s t i n g
X = linspace (0 ,3 ,65) ;
U = [];
for i = 1:65
5 i f X( i ) >= 0 && X( i ) < . 3 3
U = [U, [ 1 ; 0 ; 0 ] ] ;
e l s e i f X( i ) >= . 3 3 && X( i ) < . 6 7
U = [U, [ 0 ; 1 ; 0 ] ] ;
else
10 U = [U, [ 0 ; 0 ; 1 ] ] ;
end
end
%r a n d o m l y p e r m u t e t h e o r d e r o f t h e i n p u t and t a r g e t set .
P = randperm ( 6 5 ) ;
15 XX = X(P) ;
UU = [ ] ;
for i = 1:65
j = P( i ) ;
UU = [UU,U ( : , j ) ] ;
20 end

I n p u t = [XX’ , XX ’ ] ;
T a r g e t = UU’ ;

25 [ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,


BEG) ;
[ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = G e t R e c o g n i t i o n (OUT, YI , Target
,0.4 ,0.6) ;
Classified
C l a s s i f i e d = 89.23

Note on the training data, we classified 91.0891 % which is 92 out of the original 101
and on the testing, we got 89.23 % which is 58 out of 65 correct. For many purposes,
this level of recognition is pretty good.

References

J. Bower, D. Beeman, The Book of Genesis: Exploring Realistic Neural Models with the GEneral
NEural SImulation System, 2nd edn. (Springer TELOS, New York, 1998)
D. Harland, R. Jackson, Portia perceptions: the umwelt of an sraneophagic jumping spider, in Complex
Worlds from Simpler Nervous Systems, ed. by F. Prete (MIT Press, Cambridge, 2004), pp. 5–40.
A Bradford Book
M. Hines, Neuron (2003). http://www.neuron.yale.edu
K. Kral, F. Prete, In the mind of a hunter: the visual world of the Praying Mantis, in Complex
Worlds from Simpler Nervous Systems, ed. by F. Prete (MIT Press, Cambridge, 2004), pp. 75–116.
A Bradford Book
S. North, E. Ganser, Graphviz: Graph Visualization Software (AT & T, 2010). http://www.graphviz.
org
References 415

J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd,
Singapore, 2015 In press)
R.Raizada,S.Grossberg,Towardsatheoryofthelaminararchitectureofcerebralcortex:computational
clues from the visual system. Cereb. Cortex 13, 100–113 (2003)
S. Zhang, M. Srinivasan, Exploration of cognitive capacity in honeybees: higher functions emerge
from small brain, in Complex Worlds from Simpler Nervous Systems, ed. by F. Prete (MIT Press,
Cambridge, 2004), pp. 41–74. A Bradford Book
Chapter 19
Address Based Graphs

In order to create complicated directed graph structures for our neural modeling,
we need a more sophisticated graph object. Create a new folder called Graphs
and in it create the usual @graphs, @vertices and @edges subfolders. We
will then modify the code we had in the previous @edges and @vertices to
reflect our new needs. Let’s look at how we might build a cortical column. Each
can in a cortical column is a six layer structure which consists of the OCOS,
FFP and Two/Three building blocks. The OCOS will have addresses [0; 0; 0; 0;
0; 1 − 7] as there are 7 neurons. The FFP is simpler with addresses [0; 0; 0; 0; 0; 1 −
2] as there are just 2 neurons, The six neuron Two/Three circuit will have 6
addresses, [0; 0; 0; 0; 0; 1 − 6]. To distinguish these neurons in different circuits
from one another, we add a unique integer in the fifth component. The addresses
are now OCOS ([0; 0; 0; 0; 1; 1 − 7]), OCOS ([0; 0; 0; 0; 2; 1 − 2]) and Two/Three
([0; 0; 0; 0; 3; 1 − 6]), So a single can would look like what is shown in Eq. 19.1.

can L1 . . . . y . . FFP, y’s Address [0; 0; 0; 0; 2; 1 − 2]


L2 z . . z . . z Two/Three, z’s Address [0; 0; 0; 0; 3; 1 − 6]
L3 . . z . . . .
L4 z x . x . x z OCOS x’s Address [0; 0; 0; 0; 1; 1 − 7] (19.1)
. x . x . x .
L5 . . . . y . .
L6 . . . x . . .

We are not showing the edges here, for convenience. We can then assemble cans
into columns by stacking them vertically. This leads to the structure in Eq. 19.2. The

© Springer Science+Business Media Singapore 2016 417


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_19
418 19 Address Based Graphs

fourth component of each address now indicates the can number: 1, 2 or 3. We also
have to add a link between cans that provides the vertical processing.

can 3 L1 y FFP y’s [0; 0; 0; 3; 2; 1 − 2]


L2 z z z Two/Three z’s [0; 0; 0; 3; 3; 1 − 6]
L3 z
L4 z x x x z OCOS x’s [0; 0; 0; 3; 1; 1 − 7]
x x x
L5 y
L6 x 1 interconnection
can 2 L1 y FFP y’s [0; 0; 0; 2; 2; 1 − 2]
L2 z z z Two/Three z’s [0; 0; 0; 2; 3; 1 − 6]
L3 z
L4 z x x x z OCOS x’s [0; 0; 0; 2; 1; 1 − 7]
(19.2)
x x x
L5 y
L6 x 1 interconnection
can 1 L1 y FFP y’s [0; 0; 0; 1; 2; 1 − 2]2 edges
L2 z z z Two/Three z’s [0; 0; 0; 1; 3; 1 − 6]7 edges
L3 z
L4 z x x x z OCOS x’s [0; 0; 0; 1; 1; 1 − 7]7 edges
x x x
L5 y
L6 x
Th t Thalamus t”s

We can label the address of this column as [0; 0; 1; ·; ·; ·]. We could then use this as
a model of cortex or assemble a rectangular sheet of columns. The resulting model
of cortex would then be addresses as [0; 1; ·; ·; ·; ·]. Eventually, we will have at
least 7 modules in our small brain model. We will the following address scheme.

Sensory Model One [0; 1; ·; ·; ·; ·]


Sensory Model Two [0; 2; ·; ·; ·; ·]
Associative Cortex [0; 3; ·; ·; ·; ·]
Motor Cortex [0; 4; ·; ·; ·; ·]
Thalamus [0; 5; ·; ·; ·; ·]
MidBrain [0; 6; ·; ·; ·; ·]
Cerebellum [0; 7; ·; ·; ·; ·]
19 Address Based Graphs 419

Then the small brain model itself would have the address [1; ·; ·; ·; ·; ·]. To use
the brain model, we add an input and output module and construct a full model as
follows:

Input [1; ·; ·; ·; ·; ·]
Brain [2; ·; ·; ·; ·; ·]
Output [3; ·; ·; ·; ·; ·]

We can also construct an asymmetric brain model consisting of a left and right half
brain connected by a corpus callosum model. This would be addressed as follows:

Input [1; ·; ·; ·; ·; ·]
Left Brain [2; ·; ·; ·; ·; ·]
Corpus Callosum [3; ·; ·; ·; ·; ·]
Right Brain [4; ·; ·; ·; ·; ·]
Thalamus [5; ·; ·; ·; ·; ·]
MidBrain [6; ·; ·; ·; ·; ·]
Cerebellum [7; ·; ·; ·; ·; ·]
Output [8; ·; ·; ·; ·; ·]

which can be expanded into a more detailed listing easily.

Input [1; ·; ·; ·; ·; ·]
Left Brain [2; ·; ·; ·; ·; ·]
Sensory Model One [2; 1; ·; ·; ·; ·]
Sensory Model Two [2; 2; ·; ·; ·; ·]
Associative Cortex [2; 3; ·; ·; ·; ·]
Motor Cortex [2; 4; ·; ·; ·; ·]
Corpus Callosum [3; ·; ·; ·; ·; ·]
Right Brain [4; ·; ·; ·; ·; ·]
Sensory Model One [4; 1; ·; ·; ·; ·]
Sensory Model Two [4; 2; ·; ·; ·; ·]
Associative Cortex [4; 3; ·; ·; ·; ·]
Motor Cortex [4; 4; ·; ·; ·; ·]
Thalamus [5; ·; ·; ·; ·; ·]
MidBrain [6; ·; ·; ·; ·; ·]
Cerebellum [7; ·; ·; ·; ·; ·]
Output [8; ·; ·; ·; ·; ·]
420 19 Address Based Graphs

For example, if we used a single column with three cans for a cortex module, we
could add a simple thalamus model using one reversed OCOS circuit as shown in
Eq. 19.3. In this picture, we have addressed the thalamus as [0; 5; 0; 1; 1; 1 − 7] but
depending on how we set up the full model, this could change.

can 3 L1 y FFP y’s 3 [0; 0; 0; 3; 2; 1 − 2]


L2 z z z Two/Three z’s 3 [0; 0; 0; 3; 3; 1 − 6]
L3 z
L4 z x x xz OCOS x’s 3 [0; 0; 0; 3; 1; 1 − 7]
x x x
L5 y
L6 x 1 interconnection
can 2 L1 y FFP y’s 2 [0; 0; 0; 2; 2; 1 − 2]
L2 z z z Two/Three z’s 2 [0; 0; 0; 2; 3; 1 − 7]
L3 z
L4 z x x xz OCOS x’s 2 [0; 0; 0; 2; 1; 1 − 7]
x x x
(19.3)
L5 y
L6 x 1 interconnection
can 1 L1 y FFP y’s 1 [0; 0; 0; 1; 2; 1 − 2] 2 edges
L2 z z z Two/Three z’s 1 [0; 0; 0; 1; 3; 1 − 6] 7 edges
L3 z
L4 z x x xz OCOS x’s 1 [0; 0; 0; 1; 1; 1 − 7] 7 edges
x x x
L5 y
L6 x 2 interconnections
thalamus t Thalamus t’s [0; 5; 0; 1; 1; 1 − 7] 6 edges
t t t
t t t

We see can 1 is one OCOS, one FFP and one Two/Three for 15 neurons and so
here, a three can column has 45 neurons.

19.1 Graph Class Two

We need to extend our graph class implementation so that we can use six dimen-
sional vectors as addresses. We also want to move seamlessly back and forth from
the global node numbers needed for the incidence matrix and the Laplacian and the
full address information. Let’s start with the vertices or nodes of the graph.
19.1 Graph Class Two 421

19.1.1 Vertices Two

We have a new constructor as vertices are now six dimensional addresses.

Listing 19.1: Address based vertices


function V = vertices (a)
%
% c r e a t e n o d e s from t h e s t r u c t u r e a
%
5 % a h a s t h e form
% { [ n5 ; n4 ; n3 ; n2 ; n1 ; n0 ] , [ ] , [ ] , . . . , [ ] }
% where
% n0 = l o c a l node number i e OCOS, FFP , 2/3
% n1 = w h i c h l o c a l module
10 % n2 = s u b s u b s u b m o d u l e number i e can
% n3 = s u b s u b m o d u l e number i e column
% n4 = s u b m o d u l e number ie sensory cortex
% n5 = module number i e s m a l l b r a i n model
%
15
%
% s
%
s = struct () ;
20 t = struct () ;
n = length ( a ) ;
f o r i =1:n
b = a{ i } ;
t . number = i ;
25 t . a d d r e s s = a{ i } ;
s . node5 = b ( 1 ) ;
s . node4 = b ( 2 ) ;
s . node3 = b ( 3 ) ;
s . node2 = b ( 4 ) ;
30 s . node1 = b ( 5 ) ;
s . node0 = b ( 6 ) ;
V. v ( i ) = s ;
V. n ( i ) = t ;
end
35 V = c l a s s (V, ’ vertices ’ ) ;
end

There are more options in the subsref.m code.


Listing 19.2: Address based vertices subsref
f u n c t i o n o u t = s u b s r e f (V, s )
%
% give access to private elements
% of the v e r t i c e s o b j e c t
5 %
switch s . type
case ”.”
f l d = s . subs ;
i f ( strcmp ( f l d , ” v ” ) )
10 n = l e n g t h (V. v ) ;
o u t = [V. v ( 1 ) . node5 ;V. v ( 1 ) . node4 ;V. v ( 1 ) . node3 ; . . .
V. v ( 1 ) . node2 ;V. v ( 1 ) . node1 ;V. v ( 1 ) . node0 ] ;
f o r i =2:n
o u t = [ o u t [V. v ( i ) . node5 ;V. v ( i ) . node4 ;V . v ( i ) . node3 ; . . .
15 V. v ( i ) . node2 ;V. v ( i ) . node1 ;V. v ( i ) . node0 ] ] ;
end
e l s e i f ( strcmp ( f l d , ” n ” ) )
n = l e n g t h (V. n ) ;
f o r i =1:n
422 19 Address Based Graphs

20 o u t ( i ) . number = V. n ( i ) . number ;
out ( i ) . a d d r e s s = V. n ( i ) . a d d r e s s ;
end
else
e r r o r (” i n v a l i d property f o r v e r t i c e s o b j e c t ”) ;
25 end
otherwise
e r r o r (” @ v e r t i c e s / s u b s r e f : i n v a l i d s u b s c r i p t type f o r v e r t i c e s ”)
;
end
end

We can add a node one at a time. Again, this inefficiency arises from the fact that we
can’t simply glue the incoming data to the end of the existing node list.

Listing 19.3: Add a single node with address based vertices add
1 f u n c t i o n W = add (V, u )
%
% node i s a d d e d t o v e r t e x l i s t
%
s = struct () ;
6 W = V. v ;
n = l e n g t h (W) ;
f o r i =1:n
s . node5 = W( i ) . node5 ;
s . node4 = W( i ) . node4 ;
11 s . node3 = W( i ) . node3 ;
s . node2 = W( i ) . node2 ;
s . node1 = W( i ) . node1 ;
s . node0 = W( i ) . node0 ;
o u t { i } = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
16 end
s . node5 = u ( 1 ) ;
s . node4 = u ( 2 ) ;
s . node3 = u ( 3 ) ;
s . node2 = u ( 4 ) ;
21 s . node1 = u ( 5 ) ;
s . node0 = u ( 6 ) ;
o u t {n+1} = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
W = v e r t i c e s ( out ) ;
end

Given that we can’t concatenate, it would be more efficient to add lists of node
simultaneously.

Listing 19.4: Add a node list address based vertices addv


f u n c t i o n W = addv (V, n o d e s )
%
% g r o u p o f n o d e s a r e a d d e d t o node list
%
5 s = struct () ;
W = V. v ;
n = l e n g t h (W) ;
[m, s i z e n o d e s ] = s i z e ( n o d e s ) ;
f o r i =1:n
10 s . node5 = W( i ) . node5 ;
s . node4 = W( i ) . node4 ;
s . node3 = W( i ) . node3 ;
s . node2 = W( i ) . node2 ;
s . node1 = W( i ) . node1 ;
19.1 Graph Class Two 423

15 s . node0 = W( i ) . node0 ;
o u t { i } = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
end
for i = 1: sizenodes
s . node5 = n o d e s { i } ( 1 , 1 ) ;
20 s . node4 = n o d e s { i } ( 2 , 1 ) ;
s . node3 = n o d e s { i } ( 3 , 1 ) ;
s . node2 = n o d e s { i } ( 4 , 1 ) ;
s . node1 = n o d e s { i } ( 5 , 1 ) ;
s . node0 = n o d e s { i } ( 6 , 1 ) ;
25 o u t {n+i } = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
end
W = v e r t i c e s ( out ) ;
end

19.1.2 Edges Two

We now have edges which are pairs of six dimensional addresses. Here is the con-
structor.

Listing 19.5: Address based edges


function E = edges ( a )
2 %
% c r e a t e e d g e s from t h e v e c t o r o f p a i r s a
% o f t h e form [ i n , o u t ]
%
%
7 % The e d g e s come i n a s a p a i r o f a d d r e s s e s
% o f t h e form
% { [ n5 ; n4 ; n3 ; n2 ; n1 ; n0 ] i n , [ n5 ; n4 ; n3 ; n2 ; n1 ; n0 ] o u t }
%
s = struct () ;
12 t = struct () ;
n = length ( a ) ;
f o r i =1:n
s . i n = a{ i } ( : , 1 ) ;
s . o u t = a{ i } ( : , 2 ) ;
17 u = [ s . in , s . out ] ;
E . e{ i } = u;
t . number = i ;
t . edge = u;
E . n{ i } = t;
22 end
E = c l a s s ( E , ’ edges ’ ) ;
end

The overloaded subsref.m function is similar to what we did before.


424 19 Address Based Graphs

Listing 19.6: Address based edges subsref


f u n c t i o n out = s u b s r e f (E , s )
%
% give access to private elements
% of edges
5 %
switch s . type
case ”.”
f l d = s . subs ;
i f ( strcmp ( f l d , ” e ” ) )
10 n = l e n g t h (E . e ) ;
o u t {1} = [ E . e { 1 } ( : , 1 ) ,E . e { 1 } ( : , 2 ) ] ;
f o r i =2:n
o u t { i } = [ E . e { i } ( : , 1 ) ,E . e { i } ( : , 2 ) ] ;
end
15 e l s e i f ( strcmp ( f l d , ” n ” ) )
n = l e n g t h (E . n ) ;
f o r i =1:n
o u t ( i ) . number = i ;
o u t ( i ) . e d g e = [ E . e { i } ( : , 1 ) ,E . e { i } ( : , 2 ) ] ;
20 end
else
e r r o r (” i n v a l i d property f o r edges o b j e c t ”) ;
end
otherwise
25 e r r o r ( ” @edges / s u b s r e f : i n v a l i d s u b s c r i p t t y p e f o r edges ”) ;
end
end

We can add a single edge at a time.

Listing 19.7: Add a single edge address based edge add


f u n c t i o n W = add ( E , u , v )
%
3 % edge i s added t o edge l i s t
%
s = struct () ;
F = E. e ;
n = l e n g t h (F) ;
8 for i =1:n
s . i n = F{ i } ( : , 1 ) ;
s . o u t = F{ i } ( : , 2 ) ;
out{ i } = [ s . in , s . out ] ;
end
13 o u t {n+1} = [ u , v ] ;
W = edges ( out ) ;
end

However, it is more efficient to add a list of edges at one time.

Listing 19.8: Add an edge list address based edges addv


f u n c t i o n W = addv ( E , l i n k s )
%
% group o f e d g e s are added t o edge list
%
5 s = struct () ;
F = E. e ;
[m, s i z e e d g e s ] = s i z e ( F ) ;
[m, s i z e l i n k s ] = s i z e ( l i n k s ) ;
out = {};
19.1 Graph Class Two 425

10 f o r i =1: s i z e e d g e s
s . i n = F{ i } ( : , 1 ) ;
s . o u t = F{ i } ( : , 2 ) ;
out{ i } = [ s . in , s . out ] ;
end
15 f o r i = 1: sizelinks
s . in = l i n k s { i }(: ,1) ;
s . out = l i n k s { i } ( : , 2 ) ;
o u t { s i z e e d g e s+i } = [ s . i n , s . o u t ] ;
end
20 W = e d g e s ( o u t ) ;
end

19.1.3 Graph Class

The graph class is quite similar to what we had before but we have added some
options.
• g.v returns the vertices object as before. The difference is that the vertices are
now six dimensional addresses.
• g.e returns the usual edge object. Here, the difference is that the edge pairs are
not pairs of six dimensional addresses.
• g.n returns the nodes data structure obtained from the nodes = V.n expres-
sion. This gives us essentially a structure where nodes(i) is a structure whose
number field is the global node number and whose address field is the node’s six
dimensional address.
• g.l returns the edges data structure obtained from the links = E.n; expres-
sion. This gives us essentially a structure like the one above: the number field is
the global node number and the address field is the node’s six dimensional address.
For example, if we had the code snippet

Listing 19.9: Setting global locations


vOCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
[0;0;0;0;0;7]};
eOCOS = { [ [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] ] , . . .
5 [[0;0;0;0;0;1] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;2] ,[0;0;0;0;0;5]] ,...
[[0;0;0;0;0;3] ,[0;0;0;0;0;6]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;7]] ,...
10 [[0;0;0;0;0;1] ,[0;0;0;0;0;6]]};
VOCOS = v e r t i c e s (vOCOS) ;
EOCOS = e d g e s (eOCOS) ;
OCOS = g r a p h s (VOCOS, EOCOS) ;
426 19 Address Based Graphs

Then, here is what we would find for the various parts of the graph object OCOS.
The OCOS.v data is a matrix of addresses. Each column is a node address. In the
code above, we first construct an OCOS graph whose nodes are

Listing 19.10: Initial OCOS node addresses


vOCOS = OCOS . v
vOCOS =

0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 2 3 4 5 6 7

Note that our OCOS circuit now does not contain a node 8 as that is a thala-
mus neuron and we will model it separately later. Then, we use the methods
addlocationtonodes to relabel the node addresses and and addlocation
toedges to relabel the edge addresses with the fundamental building block address
[0; 0; 0; 0; 1; ·]. This gives

Listing 19.11: OCOS addresses after updated location


locationOCOS = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
OCOS=a d d l o c a t i o n t o n o d e s (OCOS, locationOCOS ) ;
OCOS=a d d l o c a t i o n t o e d g e s (OCOS, locationOCOS ) ;
vOCOS = OCOS . v
5 vOCOS =

0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
10 0 0 0 0 0 0 0
1 1 1 1 1 1 1
1 2 3 4 5 6 7

For example, the address of the third node is

Listing 19.12: Updated node 3 address


vOCOS ( : , 3 )
ans =
3
0
0
0
0
8 1
3
19.1 Graph Class Two 427

which reflects what we said before. The OCOS circuit is a basic building block
whose address is [0; 0; 0; 0; 1; ·]. This neuron is the third one in that circuit and so
its address is [0; 0; 0; 0; 1; 3]. Since we want to maintain a list of global nodes, the
next data structure maintains a table whose first column is the global node number
and whose second column is the address of that node.

Listing 19.13: OCOS vertices data


1 nOCOS = OCOS . n
nOCOS =
{
1 x7 s t r u c t a r r a y c o n t a i n i n g the fields :

6 number
address
}

Hence, to access the third node, we type

Listing 19.14: Accessing node 3 address


nOCOS( 3 )
2 ans =
{
number = 3
address =

7 0
0
0
0
1
12 3

Then we can ask what the global node number is as well as the six dimensional
address.

Listing 19.15: Find OCOS global node numbers


1 lOCOS = OCOS . l
lOCOS =
{
1 x7 s t r u c t a r r a y c o n t a i n i n g the fields :
428 19 Address Based Graphs

6 number
edge
}

lOCOS ( 3 )
11 ans =
scalar structure c o n t a i n i n g the fields :
number = 3
edge =
0 0
16 0 0
0 0
0 0
1 1
1 4
21 }

The OCOS.l command returns a data structure whose fields are the global edge
number and an edge. Recall an edge is between the in node and the out node. So,
here edge is a pair of six dimensional addresses with the first address begin the the
address of the in node the second one being the address of the out node. For example,
the information on the third edge is

Listing 19.16: OCOS global node 3 addresses


lOCOS ( 3 ) . number
ans = 3

To extract the edges, we use the .edge command.

Listing 19.17: OCOS global node 3 addresses


lOCOS ( 3 ) . e d g e
ans =
0 0
0 0
5 0 0
0 0
1 1
6 3

The in and out parts of the edge correspond to the first and second column, respec-
tively.
19.1 Graph Class Two 429

Listing 19.18: Accessing OCOS edge 3 in and out data


lOCOS ( 3 ) . e d g e ( : , 1 )
ans =
0
0
5 0
0
1
6
lOCOS ( 3 ) . e d g e ( : , 2 )
10 ans =
0
0
0
0
15 1
3

With all the above said, we can now look at the constructor and see how we imple-
mented this.

Listing 19.19: Address based graph constructor graph


f u n c t i o n g = g r a p h s (V, E)
%
% c o n s t r u c t o r for graph g
4 %
% V = vertices object
% E = edges o b j e c t
%
g . v = V;
9 g . e = E;
n o d e s = V. n ;
g . n = nodes ;
links = E. n ;
g . l = links ;
14 g = c l a s s ( g , ’ graphs ’ ) ;
end

We overload subsref.m as follows. There are the usual cases.

Listing 19.20: Address based graph constructor graph subsref


f u n c t i o n out = s u b s r e f ( g , s )
%
% give access to private elements
% of the graph
5 %
switch s . type
case ”.”
f l d = s . subs ;
i f ( strcmp ( f l d , ” v ” ) )
10 Nodes = g . v ;
o u t = Nodes . v ;
e l s e i f ( strcmp ( f l d , ” e ” ) )
Edges = g . e ;
o u t = Edges . e ;
430 19 Address Based Graphs

15 e l s e i f ( strcmp ( f l d , ” n ” ) )
out = g . n ;
e l s e i f ( strcmp ( f l d , ” l ” ) )
out = g . l ;
else
20 e r r o r (” i n v a l i d property f o r graph o b j e c t ” ) ;
end
otherwise
e r r o r (” @ ve r t i c e s / s u b s r e f : invalid s u b s c r i p t type f o r graphs ”) ;
end
25 end

19.2 Class Methods Two

First, let’s look at how we add the mask or global address to a given graph.

19.2.1 Add Location Methods

We begin with the code to add a mask or location to each node and vertex of a graph.

Listing 19.21: Address based addlocationtonodes


function g = addlocationtonodes (g , vector )
%
%
%
5 nodes = g . v ;
V = nodes . v ;
[ n , n o d e s i z e ] = s i z e (V) ;
f o r i =1: n o d e s i z e
B{ i } = V ( : , i )+v e c t o r ;
10 end
W = v e r t i c e s (B) ;
g . v = W;
n o d e s 2 = W. n ;
g . n = nodes2 ;
15 end

We then add locations to edges.

Listing 19.22: Address based addlocationtoedges


function g = addlocationtoedges (g , vector )
%
%
%
5 u = g.e;
E = u.e;
[ n , e d g e s i z e ] = s i z e (E) ;
for i = 1: edgesize
B{ i } = [ E{ i } ( : , 1 ) + v e c t o r , E{ i } ( : , 2 ) + v e c t o r ] ;
10 end
F = e d g e s (B) ;
g . e = F;
links = F.n;
g . l = links ;
15 end
19.2 Class Methods Two 431

19.2.2 Add Edge Methods

We can add one edge as follows.

Listing 19.23: Address based addedge


f u n c t i o n W = addedge ( g , u , v )
%
% a d j o i n an e d g e t o an e x i s t i n g graph
%
5 E = add ( g . e , u , v ) ;
W = g r a p h s ( g . v , E) ;
end

We can also add a list of edges.

Listing 19.24: Address based addedgev


f u n c t i o n W = addedgev ( g , l i n k s )
%
3 % a d j o i n an e d g e t o an e x i s t i n g graph
%
E = addv ( g . e , l i n k s ) ;
W = g r a p h s ( g . v , E) ;
end

19.2.3 Add Node Methods

We can add a single node.

Listing 19.25: Address based addnode


f u n c t i o n W = addnode ( g , j )
%
3 % a d j o i n a node t o an e x i s t i n g graph
%
V = add ( g . v , j ) ;
W = g r a p h s (V, g . e ) ;
end

We can also add a list of nodes.

Listing 19.26: Address based addnodev


f u n c t i o n W = addnodev ( g , n o d e s )
%
3 % a d j o i n n o d e s t o an e x i s t i n g g r a p h
%
F = addv ( g . v , n o d e s ) ;
W = graphs (F , g . e ) ;
end
432 19 Address Based Graphs

19.2.4 Finding the Incidence Matrix

This is now harder than you might think as the incidence matrix requires us to use
global node and edge numbers and all of our addresses are six dimensional vectors.
The basic idea is that once we create our graph with six dimensional addresses,
we want to create a data structure B which links the pairs (global node number,
node address) somehow. We will do this by creating a vector B. We come up with a
function φ which assigns to each vector address a and unique integer φ(a). We then
set the entry B(φ(a)) in the matrix to have the value of the global node number that is
linked to this address. So for example, say the global node of a = [0; 2; 3; 0; 1; 2])
is 8 and φ(a) = 46. Then we set B(46) = 8. The other rows of B are given values
of 0. So it would be nice if we kept the size of B small and had as few 0’s in it as
possible. Hence, B(φ(a)) is the global node number corresponding to the address a.
A standard way to do this in other programming languages is to create what is called
an associative array C which would allows us to use the vector address as a key
whose value is the corresponding node i; hence, we could write A([0; 2; 3; 0; 1; 2])
if that address is in the graph and this would return the global node for that address.
This is called a reverse lookup on the six dimensional address. Now MatLab/Octave
does not have associative containers so we cannot use the six dimensional address as
a key. Now once, we have created our function φ (called a hashing function), we can
calculate the associative array B and write a function to do the reverse lookups. Now
in our graph data structure for a graph g, nodes = g.n gives us a structure where
nodes(i).number gives the global node number i and nodes(i).address
is the corresponding vector address. The function link2global will take a six
dimensional address and find the global node number. It has the following code.

Listing 19.27: Link2global


f u n c t i o n node = l i n k 2 g l o b a l ( nodes , a d d r e s s )
%
% f i n d g l o b a l node number f o r a g i v e n a d d r e s s
% i n t h e d a ta s t r u c t u r e nodes where
5 % nodes ( i ) . address ( j ) i s t h e v a l u e of t h e j t h s l o t of
% the vector address .
%
[ sizeaddress , n ] = s i z e ( address ) ;
[m, s i z e A ] = s i z e ( n o d e s ) ;
10 b = zeros ( sizeaddress ,1) ;
% l o o p through t h e data s t r u c t u r e nodes .
% The o n l y way we can match a d d r e s s i s i f
% a l l t h e e n t r i e s o f b sum t o z e r o .
f o r i = 1: sizeA
15 for j = 1: sizeaddress
b ( j ) = ( a d d r e s s ( j ) − nodes ( i ) . a d d r e s s ( j ) ) ˆ 2 ;
end
i f sum ( b ) == 0
node = i ;
20 break ;
end
end

end
19.2 Class Methods Two 433

The most straightforward way to build the incidence matrix is then to do a loop as
follows.

Listing 19.28: Straightforward incidenceOld


1 function K = incidenceOld (g)
%
% g . v i s the v e r t i c e s o b j e c t of the graph
% g . e i s the edge o b j e c t of the graph
%
6 % g e t node a s s o c i a t i v e a r r a y from t h e v e r t i c e s
V = g.v;
nodes = g . n ;

% get link associative a r r a y from t h e edges


11 E = g.e;
edges = g . l ;

% get sizes
[ row , s i z e V ] = s i z e ( n o d e s ) ;
16 [ row , s i z e E ] = s i z e ( e d g e s ) ;
K = z e r o s ( sizeV , s i z e E ) ;

f o r i = 1: sizeV
for j = 1: sizeE
21 % a s s o c i a t i v e a r r a y from V g i v e s c o r r e s p o n d e n c e
% g l o b a l node number −> a c t u a l h i e r a r c h i c a l a d d r e s s
a = n o d e s ( i ) . number ;
b = nodes ( i ) . a d d r e s s ;
% g e t g l o b a l e d g e number
26 c = e d g e s ( j ) . number ;
% g e t a d d r e s s p a i r f o r t h i s g l o b a l e d g e number
d = edges ( j ) ;
%
% t h i s i s e x p e n s i v e as in g e n e r a l
31 % i t i s a l i n e a r search
%
u = l i n k 2 g l o b a l ( nodes , d ( : , 1 ) ) ;
v = l i n k 2 g l o b a l ( nodes , d ( : , 2 ) ) ;
i f u == i
36 K( i , j ) = 1 ;
e l s e i f v == i
K( i , j ) = −1;
end
end
41 end

end

Let’s count this out. For a typical small brain model of 500 nodes and 3000 edges, a
link2global call takes about 500 or less iterations. We need two calls for each
edge. So we have about 1000 iterations for each edge. But we have 3000 edges, so
that puts us up to 3 million iterations to loop through the edges for each node. Since
there are 500 nodes, this leaves us with an iteration count of 1.5 billion. So when we
use this method on a small brain model, it takes a long time to return with the
incidence matrix!
A better way to to use the method Address2Global which loops through the
nodes of the graph and assigns to each node six dimensional address its unique global
434 19 Address Based Graphs

node number via our choice of hash function φ. One way to do this is to take the six
dimensional address, convert it to a string and then convert that string to an integer.
This creates a list of unique integers. This conversion is what is called a hashing
function. We then set up a large vector B as follows. We initialize B to all zeroes.
If a six dimensional address corresponded to the unique integer N and global node
number I, we would set B(N) = I. For example, if the address is [0; 2; 3; 2; 1; 2]—
this neuron is in sensory module 2, column 3, can 2 and is neuron 2 in the fundamental
OCOS building block. This address converts to the string ’023212’ which then
converts to some integer; the simplest conversion would be to set the integer to be
23212. However, we can do other conversions too. In this simplest conversion choice,
if the global node of this neuron was, say, 67, we would then set B(23212) = 67. This
uses storage to offset the search cost; in fact, the size of B becomes enormous and we
will find out we can actually run out of memory! But this method is straightforward
to code and it is instructive to see how we can refine our ideas and make them better
with reflection! The code for this hashing function is given below. As mentioned,
this is clearly inefficient as there are mostly zeros in this vector. The loop cost here
is the size of the node list—for a small brain model about 500.

Listing 19.29: Address2Global


f u n c t i o n [ Vertex , Address , B ] = A d d r e s s 2 G l o b a l ( g )
%
% t h i s function assigns to the unique
% i n t e g e r l i n k c o r r e s p o n d i n g t o a node a d d r e s s
5 % i t s g l o b a l node number .
% g . v i s the v e r t i c e s o b j e c t of the graph
%
% g e t node a s s o c i a t i v e a r r a y from t h e v e r t i c e s
V = g.v;
10 nodes = g . n ;
[ row , s i z e V ] = s i z e ( n o d e s ) ;
%
Vertex = z e r o s ( 1 , s i z e V ) ;
Ad d r ess = z e r o s ( 1 , s i z e V ) ;
15 f o r i =1: s i z e V
p = n o d e s ( i ) . number ;
V e r t e x ( 1 , i ) = n o d e s ( i ) . number ;
A = nodes ( i ) . a d d r e s s ;
a = str2num ( A d d r e s s 2 S t r i n g (A ) ) ;
20 Ad d r ess ( 1 , i ) = a ;
end
b i g g e s t = max ( Addres s ) ;
B = zeros (1 , biggest ) ;
f o r i = 1: sizeV
25 B( A d d r e s s ( 1 , i ) ) = V e r t e x ( 1 , i ) ;
end
end

The code above uses the utility function Address2String. This converts a six
dimensional address into a string. Now we have to be careful here. Here is an example:
suppose we have 3 nodes with addresses [0, 2, 32, 1, 1, 1], [0, 2, 3, 2, 11, 1] and
[0, 2, 3, 2, 11, 1]. Then all 3 addresses will give the same integer 0232111 which
19.2 Class Methods Two 435

is a hash collision. Hence, the code in this Address2String can generate hash
collisions.

Listing 19.30: Address2String


f u n c t i o n S = A d d r e s s 2 S t r i n g (A)
%
n = l e n g t h (A) ;
S = [ num2str (A( 1 ) ) ] ;
5 f o r i =2:n
S = [ S , num2str (A( i ) ) ] ;
end

end

We can fix this by finding the maximum number of nodes for each level. Our
addresses have the form [n5 , n4 , n3 , n2 , n1 , n0 ] which in Matlab is encoded as
[n6,n5,n4,n3,n2,n1]. We can easily find the maximum number of possi-
bilities in each address entry. We use this function.

Listing 19.31: GetAddressMaximums


f u n c t i o n [M, T ] = GetAddressMaximums (G)
%
% G i s the graph
%
5 nG = G. v ;
[m, n ] = s i z e (nG) ;
M = z e r o s (m, 1 ) ;
f o r i =1:m
M( i ) = max (nG( i , : ) ) ;
10 end
T = z e r o s (m, 1 ) ;
T( 1 ) = M( 1 ) +1;
f o r i =2:m
T( i ) = (M( i ) +1)∗T( i −1) ;
15 end

Then, once we have these numbers, we add an appropriate offset to the hash
calculations. For example, if the n0 address had a maximum range of 2 − 2, we
would add 22 + 1 to all the entries in the n1 slot. If the n1 slot has a maximum range
of 9, we would add (22 + 1)(9 + 1) to all the n1 entries. This will ensure that each
address is unique. We then use these altered addresses to compute the hash value. We
will still use the original Address2String but we will now alter the addresses
we send in. However, using the string conversions generates very large addresses.
So let’s try it again. Instead of using the string conversions, we will use the address
maximums in each slot directly with the function FindGlobalAddress which
calculates the number of unique addresses we need more directly. This code is shown
below
436 19 Address Based Graphs

Listing 19.32: FindGlobalAddresses


f u n c t i o n a = F i n d G l o b a l A d d r e s s (A, gT )
%
n = l e n g t h (A) ;
a = A( 1 ) ;
5 f o r i =2:n
a = a + A( i ) ∗gT ( i −1) ;
end

end

The code for the new incidence matrix calculation then has this form.

Listing 19.33: Incidence function form


1 function K = incidence (g)

end

Then we get the address maximums for each slot.

Listing 19.34: Get address slot maximums


% g e t maximum v a l u e s f o r e a c h a d d r e s s component
2 [ gM, gT ] = GetAddressMaximums ( g ) ;

The first thing we do is get size information and set up a blank incidence matrix.

Listing 19.35: Set up blank incidence matrix


%
% g . v i s the v e r t i c e s o b j e c t of the graph
3 % g . e i s the edge o b j e c t of the graph
%
% g e t node a s s o c i a t i v e a r r a y from t h e v e r t i c e s
V = g.v;
nodes = g . n ;
8
% get link associative a r r a y from t h e edges
E = g.e;
edges = g . l ;

13 % get sizes
[ row , s i z e V ] = s i z e ( n o d e s ) ;
[ row , s i z e E ] = s i z e ( e d g e s ) ;
K = z e r o s ( sizeV , s i z e E ) ;

Then, we loop through the edges and convert all the six dimensional address pairs to
integers as we described. For a small brain model, this takes about 3000 iterations.
These integer pairs are stored in the structure out.
19.2 Class Methods Two 437

Listing 19.36: Convert address to unique hash values


% setup incidence matrix
%
d = edges ( 1 ) . edge ;
%
5 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
o u t {1} = [ u , v ] ;
f o r i =2: s i z e E
d = edges ( i ) . edge ;
10 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
out{ i } = [ u , v ] ;
end

We then call the Address2Global function which will set up a link between the
node address and the global node number. This is stored in a matrix B; i.e. B(Ai ) = Vi
where Ai is the node address and Vi is the global node number. The code also uses
the address slot maximums.

Listing 19.37: Address2Global New


f u n c t i o n [ Vertex , Address , B ] = A d d r e s s 2 G l o b a l ( g )
%
% t h i s function assigns to the unique
% i n t e g e r l i n k c o r r e s p o n d i n g t o a node a d d r e s s
5 % i t s g l o b a l node number .
% g . v i s the v e r t i c e s o b j e c t of the graph
% g e t maximum v a l u e s f o r e a c h a d d r e s s component
[ gM, gT ] = GetAddressMaximums ( g ) ;
%
10 % g e t node a s s o c i a t i v e a r r a y from t h e v e r t i c e s
V = g.v;
nodes = g . n ;
[ row , s i z e V ] = s i z e ( n o d e s ) ;
%
15 Vertex = z e r o s ( 1 , s i z e V ) ;
Ad d ress = z e r o s ( 1 , s i z e V ) ;
f o r i =1: s i z e V
V e r t e x ( 1 , i ) = n o d e s ( i ) . number ;
A = nodes ( i ) . a d d r e s s ;
20 a = F i n d G l o b a l A d d r e s s (A, gT ) ;
Ad d r ess ( 1 , i ) = a ;
end
b i g g e s t = max ( Addres s ) ;
B = zeros (1 , biggest ) ;
25 f o r i = 1: sizeV
B( Ad dr e ss ( 1 , i ) ) = V e r t e x ( 1 , i ) ;
end

end

This costs about 500 more iterations for a small brain model. We call this function
as follows:
438 19 Address Based Graphs

Listing 19.38: Calling Address2Global


1 [ Vertex , Address , B ] = A d d r e s s 2 G l o b a l ( g ) ;

Finally, we set up the incidence matrix. This is always a long double loop which
cost 500 × 3000 iterations for a small brain model even without the reverse lookup
problem. The vector B contains our needed reverse lookup information.

Listing 19.39: Setting up the incidence matrix and closing out


f o r i = 1: sizeV
for j = 1: sizeE
d = out{ j } ;
u = out{ j } ( 1 , 1 ) ;
5 v = out{ j } ( 1 , 2 ) ;
i f B( 1 , u ) == i
K( i , j ) = 1 ;
e l s e i f B( 1 , v ) == i
K( i , j ) = −1;
10 end
end
end

end

The full code for the incidence function is then

Listing 19.40: Address based incidence


function K = incidence (g)
%
% g . v i s the v e r t i c e s o b j e c t of the graph
% g . e i s the edge o b j e c t of the graph
5 %
% g e t maximum v a l u e s f o r e a c h a d d r e s s component
[ gM, gT ] = GetAddressMaximums ( g ) ;
%
% g e t node a s s o c i a t i v e a r r a y from t h e v e r t i c e s
10 V = g.v;
nodes = g . n ;

% get link associative a r r a y from t h e edges


E = g.e;
15 edges = g . l ;

% get sizes
[ row , s i z e V ] = s i z e ( n o d e s ) ;
[ row , s i z e E ] = s i z e ( e d g e s ) ;
20 K = z e r o s ( sizeV , s i z e E ) ;
%
% setup incidence matrix
%
d = edges ( 1 ) . edge ;
25 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
o u t {1} = [ u , v ] ;
f o r i =2: s i z e E
d = edges ( i ) . edge ;
19.2 Class Methods Two 439

30 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
out{ i } = [ u , v ] ;
end

35 [ Vertex , Address , B ] = A d d r e s s 2 G l o b a l ( g ) ;

f o r i = 1: sizeV
for j = 1: sizeE
% a s s o c i a t i v e a r r a y from V g i v e s c o r r e s p o n d e n c e
40 % g l o b a l node number −> a c t u a l h i e r a r c h i c a l a d d r e s s
d = out{ j } ;
u = out{ j } ( 1 , 1 ) ;
v = out{ j } ( 1 , 2 ) ;
i f B( 1 , u ) == i
45 K( i , j ) = 1 ;
e l s e i f B( 1 , v ) == i
K( i , j ) = −1;
end
end
50 end

end

For example, here is a simple OCOS incidence calculation. We begin by building


the OCOS graph.

Listing 19.41: First, setup the OCOS graphj


vOCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
3 [0;0;0;0;0;7]};
eOCOS = { [ [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] ] , . . .
[[0;0;0;0;0;1] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;2] ,[0;0;0;0;0;5]] ,...
8 [[0;0;0;0;0;3] ,[0;0;0;0;0;6]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;7]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;7]]};
VOCOS = v e r t i c e s (vOCOS) ;
EOCOS = e d g e s (eOCOS) ;
13 OCOS = g r a p h s (VOCOS, EOCOS) ;
locationOCOS = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
OCOS=a d d l o c a t i o n t o n o d e s (OCOS, locationOCOS ) ;
OCOS=a d d l o c a t i o n t o e d g e s (OCOS, locationOCOS ) ;

Next, although it is part of the incidence code, we can explicitly find the
associative array B to see what it looks like.

Listing 19.42: The Address2Global Calculation


[ Vertex , Address , B ] = A d d r e s s 2 G l o b a l (OCOS) ;
gM =
0
0
5 0
0
1
7
gT =
440 19 Address Based Graphs

10 1
1
1
1
2
15 16
row = 1
sizeV = 7
b i g g e s t = 15

We see the associative array B has 15 rows (the value of biggest). There are only
two slots in the address vectors that have entries: slot 5 and 6 with corresponding
maxima 1 and 7. The algorithm we have described for assigning a global number to
each address then finds the associative array B. Note we have

Listing 19.43: The actual B matrix


> Vertex =

1 2 3 4 5 6 7

5 > Ad d r ess
Ad d r ess =

3 5 7 9 11 13 15
> B
10 B =

0 0 1 0 2 0 3 0 4 0 5 0 6 0 7

This is how it works. if nodes = OCOS.n, then nodes(1).number is 1, the


global node. This is stored as Vertex(1). Then nodes(1).address is the
nodes address which is [0; 0; 0; 0; 1; 1] which is converted into an integer via our
hashing function and stored as Address(1) or 3. Thus B(3) = 1 and we can see
how we assign these unique integers to the global node numbers. The actual incidence
matrix is then calculated to be

Listing 19.44: Sample OCOS incidence calculation


KOCOS = i n c i d e n c e (OCOS)
KOCOS =
1 1 1 0 0 0 1
−1 0 0 1 0 0 0
5 0 −1 0 0 1 0 0
0 0 −1 0 0 1 0
0 0 0 −1 0 0 0
0 0 0 0 −1 0 −1
0 0 0 0 0 −1 0
19.2 Class Methods Two 441

19.2.5 Get the Laplacian

Once we know how to find the incidence matrix, we can find the Laplacian of the
graph easily.

Listing 19.45: Address based laplacian


1 function L = laplacian (g)
%
% g i s a graph having
% vertices g . v
% edges g . e
6 %
K = incidence (g) ;
L = K∗K ’ ;
end

19.3 Evaluation and Update Strategies in Graphs

At any neuron i in our graphs, the neurons which interact with it via a synaptic contact
are in the backward set for neuron i, B(i). Hence, the input to neuron i is the sum

Ej→i Y (j)
j∈B(i)

where Ej→i is value we assign the edge between j and i in our graph and Y (j) is the
output we assign to neuron j. We also know the neurons that neuron i sends its output
signal to are in its forward set F(i). As usual, both of these sets are readily found by
looking at the incidence matrix for the graph. We can do this in the code BFsets
which is a new graph method placed in the @graphs directory. This will be more
complicated code than the version we have for global node and link numbers. In it,
we find the backward and forward sets for each neuron in both the six dimensional
address form and the global node form. In a given row of the incidence matrix, the
positive 1’s tells us the edges corresponding to the forward links and the negative
1’s give us the backward edges. So we look at each row of the incidence matrix and
calculate the needed nodes for the backward and forward sets.
442 19 Address Based Graphs

Listing 19.46: BFsets


f u n c t i o n [ BackA , BackGlobal , ForwardA , ForwardGlobal , . . .
BackEdgeGlobal , ForwardEdgeGlobal ] = B F s e t s ( g , Kg)
%
% g i s the graph
5 % Kg i s t h e i n c i d e n c e m a t r i x o f t h e g r a p h
% g . e i s the edge o b j e c t of the graph
% g . l i s the edge array of the graph
% BackA i s t h e Backward s e t s w i t h a d d r e s s e s
% B a c k G l o b a l i s t h e Backward s e t s w i t h g l o b a l node numbers
10 % ForwardA i s t h e Forward s e t s w i t h a d d r e s s e s
% F o r w a r d G l o b a l i s t h e Forward s e t s w i t h g l o b a l node numbers
% B a c k E d g e G l o b a l i s t h e Bakward e d g e s w i t h g l o b a l numbers
% F o r w a r d E d g e G l o b a l i s t h e Forward e d g e s w i t h g l o b a l numbers
%
15 % g e t maximum v a l u e s f o r e a c h a d d r e s s component
[ gM, gT ] = GetAddressMaximums ( g ) ;

% get link associative a r r a y from t h e edges


E = g.e;
20 edges = g . l ;

% get sizes
[ row , s i z e E ] = s i z e ( e d g e s ) ;
%
25 % setup out matrix
%
d = edges ( 1 ) . edge ;
%u = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 1 ) ) ) ;
%v = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 2 ) ) ) ;
30 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
o u t {1} = [ u , v ] ;
f o r i =2: s i z e E
d = edges ( i ) . edge ;
35 %u = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 1 ) ) ) ;
%v = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 2 ) ) ) ;
u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
out{ i } = [ u , v ] ;
40 end

[ Vertex , Address , B ] = A d d r e s s 2 G l o b a l ( g ) ;

[ Kgrows , K g c o l s ] = s i z e (Kg) ;
45
BackA = { } ;
ForwardA = { } ;
BackGlobal = {};
ForwardGlobal = {};
50 BackEdgeGlobal = { } ;
f o r i = 1 : Kgrows
BackA{ i } = [ ] ;
ForwardA{ i } = [ ] ;
BackGlobal{ i } = [ ] ;
55 ForwardGlobal { i } = [ ] ;
BackEdgeGlobal { i } = [ ] ;
ForwardEdgeGlobal { i } = [ ] ;
f o r j = 1 : Kgcols
d = edges ( j ) . edge ;
60 u = d(: ,1) ;
v = d(: ,2) ;
a = out{ j } ( 1 , 1 ) ;
b = out{ j } ( 1 , 2 ) ;
i f Kg( i , j ) == 1
19.3 Evaluation and Update Strategies in Graphs 443

65 ForwardA{ i } = [ ForwardA{ i } , v ] ;
F o r w a r d G l o b a l { i } = [ F o r w a r d G l o b a l { i } , B( 1 , b ) ] ;
ForwardEdgeGlobal { i } = [ ForwardEdgeGlobal { i } , j ] ;
e l s e i f Kg( i , j ) == −1
BackA{ i } = [ BackA{ i } , u ] ;
70 B a c k G l o b a l { i } = [ B a c k G l o b a l { i } , B( 1 , a ) ] ;
BackEdgeGlobal { i } = [ BackEdgeGlobal { i } , j ] ;
end
end
end
75 end

Let’s see how this works with a simple example. We again build an OCOS module
with 7 nodes and 7 edges.

Listing 19.47: Build an OCOS Graph


vOCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
[0;0;0;0;0;7]};
eOCOS = { [ [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] ] , . . .
5 [[0;0;0;0;0;1] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;2] ,[0;0;0;0;0;5]] ,...
[[0;0;0;0;0;3] ,[0;0;0;0;0;6]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;7]] ,...
10 [[0;0;0;0;0;1] ,[0;0;0;0;0;6]]};
VOCOS = v e r t i c e s (vOCOS) ;
EOCOS = e d g e s (eOCOS) ;
OCOS = g r a p h s (VOCOS, EOCOS) ;
locationOCOS = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
15 OCOS=a d d l o c a t i o n t o n o d e s (OCOS, locationOCOS ) ;
OCOS=a d d l o c a t i o n t o e d g e s (OCOS, locationOCOS ) ;
KOCOS = i n c i d e n c e (OCOS) ;
[ BackA , BackGlobal , ForwardA , ForwardGlobal , . . .
BackEdgeGlobal , ForwardEdgeGlobal ] = B F s e t s (OCOS,KOCOS) ;

Let’s generate the graphical file for the OCOS graph.

Listing 19.48: Build OCOS.dot


1 incToDot (KOCOS, 8 , 6 , 1 . 0 , ’ OCOS . dot ’ ) ;

Now create the graphical file which we show in Fig. 19.1.

Listing 19.49: Create AddressOCOS.pdf


d o t −Tpdf −o AddressOCOS . p d f OCOS . d o t

Now the backward global information here is B{1} = {}, B{2} = {1}, B{3} = {1},
B{4} = {1}, B{5} = {2}, B{6} = {3, 1} and B{7} = {4}. This information is stored
in BackGlobal which looks like
444 19 Address Based Graphs

Fig. 19.1 Typical OCOS


graph

Listing 19.50: BackGlobal


BackGlobal
BackGlobal =
{
[1 ,1] = [ ] ( 0 x0 )
5 [1 ,2] = 1
[1 ,3] = 1
[1 ,4] = 1
[1 ,5] = 2
[1 ,6] =
10
3 1

[1 ,7] = 4
}

To access B{i} in code, we would then use

Listing 19.51: Access BackGlobal Entries


1 B a c k G l o b a l {2}
ans = 1
B a c k G l o b a l {6}
ans =
3 1

and so on. The forward sets are similar and the other data structures give the same
information but list everything using the full vector addresses. Thus, BackA{2}
gives the same information as BackGlobal{2} but gives the addresses instead of
global node numbers.
19.3 Evaluation and Update Strategies in Graphs 445

Listing 19.52: BackA{6}


BackA{2}
ans =
0 0
0 0
5 0 0
0 0
1 1
3 1

The simplest evaluation strategy then is to let each neuron have an output value
determined by the usual simple sigmoid function with code in sigmoid.m. We use
this sigmoid to do a simple graph evaluation. We then evaluate the sigmoid functions
at each node using the current synaptic interaction values just as we did in the global
graph code.

Listing 19.53: A Simple Evaluation Loop


f u n c t i o n [ NodeVals ] = e v a l u a t i o n (Y,W, B , BE)
%
% g i s the graph
% Y i s node v e c t o r
5 % B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
% BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
%

% get size
10 s i z e V = l e n g t h (Y) ;

f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
15 % g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
20 f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
25 NodeVals ( i ) = s i g m o i d ( sum ) ;
end

A simple edge update function is then given by in the file below. This uses what is
called a Hebbian update to change the scalar weight associated with each edge. If
the value of the post neuron i is high and the value of the edge Ej→i is also high,
the value of this edge is increased by a multiplier such as 1.05 or 1.1. The code to
implement is the same as the global graph code and is not shown.
A simple example is shown below for the evaluation step for a brain model such
as we will build in Chap. 20. This discussion is a little out of place, but we think it is
worth the risk. The next chapter shows how we build a human brain model consisting
of many modules linked together and in subsequent chapters we also build models
of a cuttlefish and pigeon brain. Hence the function brain here is a generic place
446 19 Address Based Graphs

keeper for any such brain model we build. Of course, to make this really sink in,
you’ll have to read the next chapter also and go back and forth a bit to get it straight!
So to set up the evaluation, first, we set up a node value vector Y the size of the
nodes and a weight value vector W the size of edges of the brain model. Here we
will also use an offset and gain vector. For the evaluation, we still assume the neuron
or node value is given by the equation

Ej→i Y (j)
j∈B(i)

where we now posit that each edge Ej→i is simply a value called a weight, Wji and
the neuron evaluation uses a sigmoid calculation. If x is the input into a neuron and if
o is the offset of the neuron and g is its gain, we find the neuronal output is given by
  
x−o
Y = 0.5 1 + tanh .
g

So we will alter the evaluation code a bit to allow us to do this more interesting
sigmoid calculation.

Listing 19.54: A Evaluation Loop With Offsets and Gains


f u n c t i o n [ NodeVals ] = e v a l u a t i o n ( g , Y,W, O, G, B , BE)
%
% g i s the graph
% Y i s node v e c t o r
5 % W i s edge v a l u e s
% O is the o f f s e t vector
% G i s the gain vector
% B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
% BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
10 %

% get size
s i z e V = l e n g t h (Y) ;

15 f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
20 lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
25 p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
X = ( sum − O( i ) ) / G( i ) ;
Y( i ) = s i g m o i d (X) ;
30 end

NodeVals = Y;

end
19.3 Evaluation and Update Strategies in Graphs 447

We need to set up the offset vector and gain vector to be the size of the graphs
vertices and initialize them in some fashion. We fill the weight vector randomly
with numbers from −1 to 1. Then we compute the needed backward set information
and use it to perform an evaluation step. Then we do a synaptic weight Hebbian
update. At this point, we are just showing the basic ideas of this sort of computation.
We still have to restrict certain edges to have only negative values as they corre-
spond to inhibitory synaptic contacts. Also, we have not yet implemented the cable
equation update step afforded us with the graph’s laplacian. That is to come. The
function HebbianUpdateSynapticValue is the same as the one we used in
GraphsGlobal because it is written using only global graph information which
has been extracted from the address based graph. Here is how we would evaluation
the values (i.e. neuron values) at each node in a brain model. We are jumping ahead
of course, as we have not yet discussed how to build such a brain model, but this
code snippet will give you the idea. We start by building our brain model.

Listing 19.55: Build A Brain Model


[ NodeSizes , EdgeSizes , Brain ] = b u i l d b r a i n ( 1 , 1 , 1 ) ;
B u i l d SensoryOne
Neurons 1 t o 45 a r e S e n s o r y One
L i n k s 1 t o 61 a r e S e n s o r y One
5 B u i l d SensoryTwo
Neurons 46 t o 90 a r e S e n s o r y Two
L i n k s 62 t o 122 a r e S e n s o r y Two
Build AssociativeCortex
Neurons 91 t o 270 a r e A s s o c i a t i v e Co r te x
10 L i n k s 123 t o 372 a r e A s s o c i a t i v e Co rte x
B u i l d MotorCortex
Neurons 271 t o 315 a r e Motor Cortex \n
L i n k s 373 t o 433 a r e A s s o c i a t i v e Co rte x \n
B u i l d Thalamus
15 Neurons 316 t o 329 a r e Thalamus
L i n k s 434 t o 447 a r e Thalamus
B u i l d MidBrain
have dopamine e d g e s
add s e r o t o n i n e d g e s
20 add n o r e p i n e p h r i n e e d g e s
Neurons 330 t o 335 a r e MidBrain
L i n k s 448 t o 450 a r e MidBrain
Build Cerebellum
Neurons 336 t o 380 a r e C e r e b e l l u m
25 L i n k s 451 t o 511 a r e C e r e b e l l u m

Then we construct the brain model’s incidence matrix.

Listing 19.56: Construct the Brain Incidence Matrix


KBrain = i n c i d e n c e ( B r a i n ) ;
[m, n ] = s i z e ( KBrain )
m = 380
n = 707
5 W = H e b b i a n U p d a t e S y n a p t i c V a l u e (Y,W, BGlobal , BEGlobal ) ;

Next, get the backward and forward information for this graph.
448 19 Address Based Graphs

Listing 19.57: Get Forward and Backward Information


[ BA, BGlobal , FA, FGlobal , BEGlobal , FEGlobal ] =...
B F s e t s ( Brain , KBrain ) ;

Now, initialize the output vector Y, the offset and gain vectors, O and G and the
edge weight vector W. Here we initialize W to be randomly chosen in the range [−1, 1],
the offset O to be randomly in the range [−1, 1] and the Gain G to be randomly in
the range [0.45, 0.9].

Listing 19.58: Initializing Y, O, G and W


Y = zeros (1 ,380) ;
O = −1+2∗rand ( 1 , 3 8 0 ) ;
G = 0 . 5 ∗ ( 0 . 9 + rand ( 1 , 3 8 0 ) ) ;
W = −1+2∗rand ( 1 , 3 8 0 ) ;

We start with

Listing 19.59: First 5 Initial Node Values


1 Y( 1 : 5 )
ans =

0 0 0 0 0

Now evaluate the graph for the first time.

Listing 19.60: First Graph Evaluation


1 Y = e v a l u a t i o n ( Brain , Y,W, O, G, BGlobal , BEGlobal ) ;

We have done the first evaluation and now the node values have changed.

Listing 19.61: First 5 Node Values


Y( 1 : 5 )
ans =

4 0.466747 0.729127 0.451896 0.178188 0.066558

Now do the first Hebbian Update which updates the weight values on the edges.
Follow that by another evaluation to see what happened.
19.3 Evaluation and Update Strategies in Graphs 449

Listing 19.62: The First Hebbian Update


1 W = H e b b i a n U p d a t e S y n a p t i c V a l u e (Y,W, BGlobal , BEGlobal ) ;
Y = e v a l u a t i o n ( Brain , Y,W, O, G, BGlobal , BEGlobal ) ;

We can see how the new weight values have effected the nodal computations!

Listing 19.63: First 5 Nodal Values After One Hebbian Update


Y( 1 : 5 )
ans =
3
0.446827 0.704026 0.391186 0.433918 0.051564

19.4 Adding Inhibition

Now let’s consider how we might train a given graph to have certain target node
values for specified inputs. We will illustrate this with a very simple OCOS graph
and one input at global node 1 of value 0.3. The specified output is in nodes 5, 6
and 7 of value 0, 1 and 0. We will continue to use simple sigmoid node evaluation
engines for now. We encode the input and target information as follows. First the
inputs:

Listing 19.64: Setting Up the Input Data


1 Input = [ 0 . 3 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ]
Input =

0.00000
0.00000
6 0.00000
0.00000
0.00000
0.00000
0.30000

Then the targets:

Listing 19.65: Setting Up Training Data


Target = s t r u c t ( ) ;
DesiredOutput = {};
Target . i n = 5 ; Target . value = 0.0;
D e s i r e d O u t p u t {1} = [ T a r g e t . i n , Target . value ] ;
5 Target . i n = 6 ; Target . value = 1.0;
D e s i r e d O u t p u t {2} = [ T a r g e t . i n , Target . value ] ;
Target . i n = 7 ; Target . value = 0.0;
D e s i r e d O u t p u t {3} = [ T a r g e t . i n , Target . value ] ;

We usually call this sort of information Training Data. We can check how we did
with a few simple tests.
450 19 Address Based Graphs

Listing 19.66: Testing the Training Data


D e s i r e d O u t p u t {1}
2 ans =

5 0

DesiredOutput {1}(1)
7 ans = 5
DesiredOutput {1}(2)
ans = 0
DesiredOutput {2}(1)
ans = 6
12 DesiredOutput {2}(2)
ans = 1
DesiredOutput {3}(1)
ans = 7
DesiredOutput {3}(2)
17 ans = 0

Next, set up the OCOS graph.

Listing 19.67: Setup OCOS Graph


vOCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
3 [0;0;0;0;0;7]};
eOCOS = { [ [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] ] , . . .
[[0;0;0;0;0;1] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;2] ,[0;0;0;0;0;5]] ,...
8 [[0;0;0;0;0;3] ,[0;0;0;0;0;6]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;7]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;6]]};
VOCOS = v e r t i c e s (vOCOS) ;
EOCOS = e d g e s (eOCOS) ;
13 OCOS = g r a p h s (VOCOS, EOCOS) ;
locationOCOS = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
OCOS=a d d l o c a t i o n t o n o d e s (OCOS, locationOCOS ) ;
OCOS=a d d l o c a t i o n t o e d g e s (OCOS, locationOCOS ) ;

We then find the incidence matrix, the backward and forward data and initialize all
the graph values.

Listing 19.68: Incidence Matrix, Backward and Forward Data and Initialization
KOCOS = i n c i d e n c e (OCOS) ;
[ BA, BGlobal , FA, FGlobal , BEGlobal , FEGlobal ] = B F s e t s (OCOS,KOCOS) ;
Y = zeros (1 ,7) ;
4 O = −1+2∗rand ( 1 , 7 ) ;
G = 0 . 5 ∗ ( 0 . 9 + rand ( 1 , 7 ) ) ;
W = rand ( 1 , 7 )
W=
0.527088 0.959165 0.048946 0.374755 0.322442 0.698353
0.184643
9 % so w e i g h t s a r e a l l p o s i t i v e . Now we s e t t h e i n h i b i t o r y e d g e s .
W( 4 ) = −W( 4 ) ;
W( 5 ) = −W( 5 ) ;
W( 6 ) = −W( 6 ) ;
W=
14 0.45076 0.81485 0.45086 − 0.79840 − 0.10357 − 0.74836
0.59966
19.4 Adding Inhibition 451

We are setting edges 4, 5 and 6 to be inhibitory as we know this is what the neuro-
biology tells us. We do this by manually changes these edge weights to be negative
once they are randomly set. We set W to be values randomly chosen between 0 and
1 and then set the inhibitory values. Now we have to do a preliminary evaluation.
This uses a new function evalwithinput.

Listing 19.69: Preliminary Evaluation With Input


Y = e v a l w i t h i n p u t (OCOS, Y,W, O, G, Input , BGlobal , BEGlobal ) ;
Y =
0.85963 0.67221 0.46610 0.19435 0.63086 0.94294
0.92217

The new function is like the old evaluation except it use input data to seed each
nodal calculation. Here is the updated function: the only thing that changed is that
each sum is initialized to an input value rather than zero.

Listing 19.70: Evaluation With Input Data


f u n c t i o n [ NodeVals ] = e v a l w i t h i n p u t ( g , Y,W, O, G, Input , B , BE)
%
% g i s the graph
% Y i s node v e c t o r
5 % W i s edge v a l u e s
% O is the o f f s e t vector
% G i s the gain vector
% Input i s the input vector
% B i s t h e g l o b a l b a c k w a r d node s e t i n f o r m a t i o n
10 % BE i s t h e g l o b a l b a c k w a r d e d g e s e t i n f o r m a t i o n
%

% get size
s i z e V = l e n g t h (Y) ;
15
f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
20 BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = I n p u t ( i ) ;
f o r j = 1 : lenBEF
25 l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
X = ( sum − O( i ) ) / G( i ) ;
30 Y( i ) = s i g m o i d (X) ;
end

NodeVals = Y;

35 end

Now we need to alter the edge values of the graph so that we can map our input
data to our desired target data. We do this with the modified Hebbian training
452 19 Address Based Graphs

algorithm shown in HebbianUpdateErrorSignal. This function performs a


single update step. but we still have to write the code!
Listing 19.71: Hebbian Training With an Error Signal
Inhibitory = [ 2 ; 3 ; 4 ] ;
scale = 1.1;
Tol = . 3 6 ;
TargTol = . 2 5 ;
5 W = H e b b i a n U p d a t e E r r o r S i g n a l (Y, O, G,W, BGlobal , BEGlobal , I n h i b i t o r y ,
D e s i r e d O u t p u t , s c a l e , Tol , TargTol )

We need to set some tolerances before we do the Hebbian updates: scale is the
usual multiplier we see in a Hebbian update, Tol is the tolerance that determines
if an edge weight should be updated because of a correlation with the post nodal
value and TarTol is the fraction of the error update we use. The new function
is listed below. This function is a bit complicated as we are now handling updates
differently depending on whether the neuron is inhibitory or excitatory and whether
the neuron is an input or a target. We start by taking the given input and target data and
use it to set up a data structure which hold this information. The ValueAndType
structure holds three things: the neuron value value; whether it is inhibitory or
not inhibitory = 1 or inhibitory = 0; and whether it is a target or not,
target = 1 or target = 0. We loop through the neurons and initialize the
ValueAndType data structure and then store it in a cell array Neuron for access
later.

Listing 19.72: Initialized the Neuron Data


s i z e V = l e n g t h (Y) ;
s i z e E = l e n g t h (W) ;
s i z e T = l e n g t h ( DesiredOutput ) ;
sizeI = length ( Inhibitory ) ;
5 ValueAndType = s t r u c t ( ) ;
Neuron = { } ;
f o r i =1: s i z e V
ValueAndType . v a l u e = Y( i ) ;
ValueAndType . i n h i b i t o r y = 0 ;
10 ValueAndType . t a r g e t = 0 ;
f o r k =1: s i z e I
i f i == I n h i b i t o r y ( k ) ;
ValueAndType . i n h i b i t o r y = 1 ;
end
15 end
f o r k =1: s i z e T
i f i == D e s i r e d O u t p u t {k } ( 1 ) ;
ValueAndType . t a r g e t = 1 ;
ValueAndType . v a l u e = D e s i r e d O u t p u t {k } ( 2 ) ;
20 end
end
v = ValueAndType . v a l u e ;
i n h = ValueAndype . i n h i b i t o r y ;
t a r = ValueAndType . t a r g e t ;
25 Neuron{ i } = [ v , inh , t a r ] ;
end

The update loop is therefore more complicated. In skeleton form, we have


19.4 Adding Inhibition 453

Listing 19.73: Update loop in skeleton form


f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
4 % g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
v a l u e = Neuron{ i } ( 1 ) ;
9 i n h i b i t o r y = Neuron{ i } ( 2 ) ;
t a r g e t = Neuron{ i } ( 3 ) ;
% s e e i f we h a v e a t a r g e t n e u r o n
TargetMatch = 0 ;
IsTarget = 0;
14 i f i ==t a r g e t
IsTarget = 1;
end
%n o t a t a r g e t n e u r o n
i f I s T a r g e t == 0
19 f o r j = 1 : lenBEF
% do u s u a l b a c k w a r d e d g e e v a l u a t i o n l o o p
% i f Y p o s t i s i n h i b i t o r y do one t h i n g
i f i n h i b i t o r y == 1
...
24 end
% i f Y p o s t i s e x c i t a t o r y do a n o t h e r t h i n g
i f i n h i b i t o r y == 0
...
end
29 Wts ( l i n k ) = w e i g h t ;
end% b a c k w a r d l i n k s l o o p f o r non t a r g e t s
end
% a t a r g e t neuron
i f I s T a r g e t ==1
34 % do t h e u s u a l b a c k w a r d e d g e e v a l u a t i o n l o o p
% i f Y p o s t i s i n h i b i t o r y do one t h i n g
i f i n h i b i t o r y == 1
...
end
39 % Y p o s t i s e x c i t a t o r y do a n o t h e r t h i n g
i f i n h i b i t o r y == 0
...
end
Wts ( l i n k ) = w e i g h t ;
44 end% b a c k w a r d s l i n k s l o o p f o r t a r g e t s
end % i s i s a t a r g e t
end% n e u r o n l o o p


If we have a target node i, we have the usual evaluation xi = j∈B(i) Ej→i Y (j) and
  
the target value is given by Y (i) = 0.5 1 + tanh x(i)−o(i)
g(i)
for the neuron’s offset
o(i) and gain g(i). The neuron node calculation function is thus given by
  
x−o
σ (x, o, g) = 1 + tanh .
g

which has inverse


  
−1 g y
σ (y, o, g) = o + ln .
2 1−y
454 19 Address Based Graphs

where y = σ (x, o, g). Now if the target value is 1 or 0, this will be unachievable
as the inverse will return ∞ and −∞, respectively. So in this case, we replace the
inputs by a suitable 1 −  and  for ease of computation. For a given target, we have
σ −1 (Y (i)) = x(i)−o(i)
g(i)
. This leads to x(i) = o(i) + g(i)σ −1 (Y (i)) and so we have

Ej→i Y (j) = o(i) + g(i)σ −1 (Y (i)).
j∈B(i)

We then apply Hebbian ideas and choose which of the summands in this sum to
update. For example, if the Ej→i Y (i) is sufficiently large, we note

Ej→i Y (j) ≈ o(i) + g(i)σ −1 (Y (i))

and so we could update using

o(i) + g(i)σ −1 (Y (i))


Ej→i ≈ .
Y (j)

But Y (j) could be zero and we probably don’t want to use the full update. So we use

o(i) + g(i)σ −1 (Y (i))


Ej→i ≈ 1 .
Y (j) + 2

The equation above shows how we handle the target updates with slight adjustments
for the inhibitory case.

Listing 19.74: Target Updates


% s e t sigmoid inverse function
S i g I n v = @( y , o , g ) ( o + ( g / 2 ) ∗ l o g ( ( y ) . / ( 1 − y ) ) ) ;
% a t a r g e t neuron
4 i f I s T a r g e t ==1
t a r g e t i n v e r s e = O( i ) + G( i ) ∗ S i g I n v ( v a l u e , O( i ) ,G( i ) )
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
p r e = BF( j ) ;
9 post = i ;
hebb = Y( p o s t ) ∗W( l i n k ) ;
w e i g h t = W( l i n k ) ;
i n h i b i t o r y = Neuron{ p o s t } ( 2 ) ;
% Y post is inhibitory
19.4 Adding Inhibition 455

14 i f i n h i b i t o r y == 1
i f hebb < −Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
end
end
19 % Y post is excitatory
i f i n h i b i t o r y == 0
i f hebb > Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
end
24 end
Wts ( l i n k ) = w e i g h t ;
end% b a c k w a r d s l i n k s l o o p f o r t a r g e t s
end % i s i s a t a r g e t

The full code is below.

Listing 19.75: HebbianUpdateErrorSignal


f u n c t i o n Wts = H e b b i a n U p d a t e E r r o r S i g n a l (Y, O, G,W, B , BE, I n h i b i t o r y , . . .
D e s i r e d O u t p u t , s c a l e , Tol , TargTol )
3 %
% B i s t h e Backward G l o b a l node i n f o r m a t i o n
% BE i s t h e Backward G l o b a l Edge i n f o r m a t i o n
% W i s the edge vector
% Y i s t h e new v a l u e o f e d g e
8 % I n h i b i t o r y i s a v e c t o r of t h e nodes t h a t are i n h i b i t o r y
% DesiredOutput i s a c e l l of s t r u c t s
% each s t r u c t i s o f t y p e Target . i n = neuron i n d e x
% and T a r g e t . v a l u e = n e u r o n v a l u e )
% DesiredOutput {2}(1) g i v e s t h e index f o r t h e second c e l l e n t r y
13 % DesiredOutput {2}(2) g i v e s t h e v a l u e f o r t h e second c e l l e n tr y
% scale is the weight rescaling factor
% Tol i s t h e d e s i r e d Hebbian t o l e r a n c e
% TargTol i s t h e d e s i r e d Hebbian t o l e r a n c e f o r t a r g e t w e i g h t u p d a t e s
%
18 % setup sigmoid inverse .
S i g I n v = @( y , o , g ) ( o + ( g / 2 ) ∗ l o g ( ( y ) . / ( 1 − y ) ) ) ;

% get size
s i z e V = l e n g t h (Y) ;
23 s i z e E = l e n g t h (W) ;
s i z e T = l e n g t h ( DesiredOutput ) ;
sizeI = length ( Inhibitory ) ;
ValueAndType = s t r u c t ( ) ;
Neuron = { } ;
28 f o r i =1: s i z e V
ValueAndType . v a l u e = Y( i ) ;
ValueAndype . i n h i b i t o r y = 0 ;
ValueAndType . t a r g e t = 0 ;
f o r k =1: s i z e T
33 i f i == I n h i b i t o r y ( k ) ;
ValueAndype . i n h i b i t o r y = 1 ;
end
end
f o r k =1: s i z e I
38 i f i == D e s i r e d O u t p u t {k } ( 1 ) ;
ValueAndype . t a r g e t = 1 ;
end
end
v = ValueAndType . v a l u e ;
456 19 Address Based Graphs

43 i n h = ValueAndype . i n h i b i t o r y ;
t a r = ValueAndType . t a r g e t ;
Neuron{ i } = [ v , inh , t a r ] ;
end

48 f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
53 lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
v a l u e = Neuron{ i } ( 1 ) ;
i n h i b i t o r y = Neuron{ i } ( 2 ) ;
t a r g e t = Neuron{ i } ( 3 ) ;
58 % s e e i f we h a v e a t a r g e t n e u r o n
TargetMatch = 0 ;
IsTarget = 0;
i f i ==t a r g e t
IsTarget = 1;
63 end
%n o t a t a r g e t n e u r o n
i f I s T a r g e t == 0
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
68 p r e = BF( j ) ;
post = i ;
hebb = Y( p o s t ) ∗W( l i n k ) ;
w e i g h t = W( l i n k ) ;
i n h i b i t o r y = Neuron{ p o s t } ( 2 ) ;
73 % Y post is inhibitory
i f i n h i b i t o r y == 1
i f hebb < −Tol
w e i g h t = − s c a l e ∗ a b s (W( l i n k ) )
end
78 end
% Y post is excitatory
i f i n h i b i t o r y == 0
i f hebb > Tol
w e i g h t = s c a l e ∗ a b s (W( l i n k ) ) ;
83 end
end
Wts ( l i n k ) = w e i g h t ;
end% b a c k w a r d l i n k s l o o p f o r non t a r g e t s
end
88 % a t a r g e t neuron
i f I s T a r g e t ==1
t a r g e t i n v e r s e = O( i ) + G( i ) ∗ S i g I n v ( v a l u e , O( i ) ,G( i ) )
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
93 p r e = BF( j ) ;
post = i ;
hebb = Y( p o s t ) ∗W( l i n k ) ;
w e i g h t = W( l i n k ) ;
i n h i b i t o r y = Neuron{ p o s t } ( 2 ) ;
98 % Y post is inhibitory
i f i n h i b i t o r y == 1
i f hebb < −Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
end
19.4 Adding Inhibition 457

103 end
% Y post is excitatory
i f i n h i b i t o r y == 0
i f hebb > Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
108 end
end
Wts ( l i n k ) = w e i g h t ;
end% b a c k w a r d s l i n k s l o o p f o r t a r g e t s
end % i s i s a t a r g e t
113 end% n e u r o n l o o p

end

Listing 19.76: A Sample One Step Training Session


%
% Inputs
%
Input = [ 0 . 3 ; 0 ; 0 ; 0 ; 0 ; 0 ; 0 ] ;
5 %
% Targets
%
Target = s t r u c t ( ) ;
DesiredOutput = {};
10 Target . i n = 5 ; Target . value = 0 . 0 ;
D e s i r e d O u t p u t {1} = [ T a r g e t . i n , T a r g e t . v a l u e ] ;
Target . i n = 6 ; Target . value = 1 . 0 ;
D e s i r e d O u t p u t {2} = [ T a r g e t . i n , T a r g e t . v a l u e ] ;
Target . i n = 7 ; Target . value = 0 . 0 ;
15 D e s i r e d O u t p u t {3} = [ T a r g e t . i n , T a r g e t . v a l u e ] ;
%
Inhibitory = [ 2 ; 3 ; 4 ] ;
%
% B u i l d OCOS
20 %
vOCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
[0;0;0;0;0;7]};
eOCOS = { [ [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] ] , . . .
25 [[0;0;0;0;0;1] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;2] ,[0;0;0;0;0;5]] ,...
[[0;0;0;0;0;3] ,[0;0;0;0;0;6]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;7]] ,...
30 [[0;0;0;0;0;1] ,[0;0;0;0;0;6]]};
VOCOS = v e r t i c e s (vOCOS) ;
EOCOS = e d g e s (eOCOS) ;
OCOS = g r a p h s (VOCOS, EOCOS) ;
locationOCOS = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
458 19 Address Based Graphs

35 OCOS=a d d l o c a t i o n t o n o d e s (OCOS, locationOCOS ) ;


OCOS=a d d l o c a t i o n t o e d g e s (OCOS, locationOCOS ) ;
%
% find incidence matrix
%
40 KOCOS = i n c i d e n c e (OCOS) ;
%
% f i n d b a c k w a r d and f o r w a r d d a t a
%
[ BA, BGlobal , FA, FGlobal , BEGlobal , FEGlobal ] = B F s e t s (OCOS,KOCOS) ;
45 %
% Initiliazation
%
Y = zeros (1 ,7) ;
O = −1+2∗rand ( 1 , 7 ) ;
50 G = 0 . 5 ∗ ( 0 . 9 + rand ( 1 , 7 ) ) ;
W = rand ( 1 , 7 ) ;
% s o w e i g h t s a r e a l l p o s i t i v e . Now we s e t t h e i n h i b i t o r y e d g e s .
W( 4 ) = −W( 4 ) ;
W( 5 ) = −W( 5 ) ;
55 W( 6 ) = −W( 6 ) ;
W=
0.92507 0.18987 0.68864 − 0.58308 − 0.16078 − 0.23091
0.69714
%
% Do f i r s t e v a l u a t i o n u s i n g i n p u t s
60 %
Y = e v a l w i t h i n p u t (OCOS, Y,W, O, G, Input , BGlobal , BEGlobal ) ;
Y =
0.3740340 0.9484705 0.1426123 0.6130007 0.0063026
0.2021721 0.1505239
scale = 1.1;
65 Tol = . 0 5 ;
TargTol = . 2 0 ;
W = H e b b i a n U p d a t e E r r o r S i g n a l (Y, O, G,W, BGlobal , BEGlobal , I n h i b i t o r y ,
D e s i r e d O u t p u t , s c a l e , Tol , TargTol ) ;
W=
0.92507 0.18987 0.68864 − 0.58308 − 0.16078 − 0.23091
0.76685
70 Y = e v a l w i t h i n p u t (OCOS, Y,W, O, G, Input , BGlobal , BEGlobal ) ;
Y =
0.3740340 0.9484705 0.1426123 0.6130007 0.0063026
0.2126961 0.1505239

Now if we wish to do these updates for multiple steps, we need a training function.
A simple one is shown in HebbianTrainingWithError whose code is given
below.
19.4 Adding Inhibition 459

Listing 19.77: Multistep Training


f u n c t i o n [ Y, Y S e r i e s ,W, WSeries ] = H e b b i a n T r a i n i n g W i t h E r r o r ( Graph ,
KGraph , T , Y,W, O, G, Input , I n h i b i t o r y , D e s ire d O u tput , s c a l e , Tol , TargTol )
% Graph = i n c o m i n g g r a p h
% T i s number o f i t e r a t i o n s
% Y i s t h e node v a l u e s
5 % W i s the edge weights
% O i s t h e node o f f s e t s
% G i s t h e node g a i n s
% I n h i b i t o r y i s t h e i n h i b i t o r y neurons
% Input i s the input vector
10 % DesiredOutput i s a c e l l of s t r u c t s
% each s t r u c t i s o f t y p e Target . i n = neuron i n d e x
% and T a r g e t . v a l u e = n e u r o n v a l u e )
% DesiredOutput {2}(1) g i v e s t h e index f o r t h e second c e l l e n t r y
% DesiredOutput {2}(2) g i v e s t h e v a l u e f o r t h e second c e l l e n tr y
15 % scale is the weight rescaling factor
% Tol i s t h e d e s i r e d Hebbian t o l e r a n c e
% TargTol i s t h e d e s i r e d Hebbian t o l e r a n c e f o r t a r g e t w e i g h t u p d a t e s
% s c a l e i s t h e Hebbian update m u l t i p l i e r
% Tol i s t h e Hebbian t o l e r a n c e
20 % TargTol i s t h e t a r g e t m u l t i p l i e r
%
[ BA, BGlobal , FA, FGlobal , BEGlobal , FEGlobal ] = B F s e t s ( Graph , KGraph ) ;
%
%Y = e v a l w i t h i n p u t ( Graph , Y,W, O, G, I n p u t , B G l o b a l , B E G l o b a l ) ;
25 %
WSeries = W’ ;
YSeries = Y’ ;
f o r t = 1 : T−1
% use Hebbian update t o r e s e t l i n k w e i g h t s
30 Wnew = H e b b i a n U p d a t e E r r o r S i g n a l (Y, O, G,W, BGlobal , BEGlobal , I n h i b i t o r y
, D e s i r e d O u t p u t , s c a l e , Tol , TargTol ) ;
% r e e v a l u a t e the graph outp uts
Ynew = e v a l w i t h i n p u t ( Graph , Y, Wnew, O, G, Input , BGlobal , BEGlobal ) ;
W = Wnew ;
Y = Ynew ;
35 YSeries = [ YSeries Y’ ] ;
WSeries = [ WSeries W’ ] ;
end

Here is how it works in a practice session. We will assume all the initialization from
the previous sample session and just show the training. Here we use 151 steps–setting
the iteration count to T = 152 actually does 151 steps.

Listing 19.78: Training for 151 Steps


scale = 1.1;
Tol = . 0 5 ;
3 TargTol = . 0 1 ;
[ Y, YS ,W,WS] = H e b b i a n T r a i n i n g W i t h E r r o r (OCOS,KOCOS, 1 5 2 ,Y,W, O, G , . . .
Input , I n h i b i t o r y , D esiredOutput , s c a l e , Tol , TargTol ) ;

After training, we find


460 19 Address Based Graphs

Listing 19.79: Y after 151 steps


W=
9 . 2 5 0 7 e −01 1 . 8 9 8 7 e −01 6 . 8 8 6 4 e −01 −5.8308 e −01 −1.6078 e −01
−2.3091 e −01 1 . 3 6 4 6 e+06
Y =
0.3740340 0.9484705 0.1426123 0.6130007 0.0063026
1.0000000 0.1505239

and we are close to achieving our targets: remember we do not use the target values
of 0 and 1. However, we are making no effort at controlling the sizes of the edge
weights as you can see from the value of W (7)!
Chapter 20
Building Brain Models

We are interested in modeling the brains and other neural systems of a variety of
creatures. This text has given us the tools to begin to do this and in this chapter,
we will show how to build a human brain model using address based graphs in a
modular fashion. Some of our modules will have a lot of structure and some will
be just sketches. Our intent is to furnish you with a guide to how you could use
these ideas to build a model of any neural system of interest. The general process
is to look at the literature of the animal in question and try to find the neural circuit
diagrams from which a graph model can be built. This is not an easy task and with
the help of students, we have built preliminary models of honeybee, spider, pigeon
and squid/cuttlefish brains. It is a very interesting journey and in the next volume we
will explore it much further.
Now to build a small brain from component modules, we need code to build the
neural modules and link them together. We do this in several steps. The first one is
to assemble a cortex module. We have already discussed this quite a bit. The plan is
to build a cortical can out of OCOS, FFP and Two/Three circuits, assemble three or
more cans into a cortical column and then assemble multiple columns into a cortical
sheet. The code to do this is very modular.

20.1 Build A Cortex Module

A cortex module is built from cans assembled into columns which are then organized
into one or two dimensional sheets. So the process of building a cortex module is
pretty intense! However, we can use the resulting module as an isocortex building
block as we know all cortex prior to environmental imprinting is the same. Hence,
we can build a cortex module for visual cortex, auditory cortex etc and simply label
their respective global vector addresses appropriately.

© Springer Science+Business Media Singapore 2016 461


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_20
462 20 Building Brain Models

Fig. 20.1 Two components


of a can circuit. a The OCOS
circuit. b The FFP circuit

20.1.1 Build A Cortical Can

We will build a cortical can using the OCOS, FFP and Two/Three building blocks.
After we build the OCOS, FFP and Two/Three blocks we set their location addresses
as we discussed. OCOS is [0; 0; 0; 0; 1; ·], FFP is [0; 0; 0; 0; 2; ·] and Two/Three is
[0; 0; 0; 0; 3; ·]. We use the function buildcan.

Listing 20.1: Structure of buildcan


function [ OCOS, FFP , TwoThree , CanOne ] = b u i l d c a n ( )

end

First,we build the OCOS, FFP and Two/Three circuit blocks, seen in Figs. 20.1a,b
and 20.2a, and set their location masks as discussed above. We have simplified the
FFP so that it consists of only 2 nodes: we just specify there is node 1 in the can
above which connects to node 1 of the OCOS module. Again, note we are not using
the thalamus node in the OCOS circuit.

Listing 20.2: Build the OCOS, FFP and Two/Three circuit blocks
vOCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
2 [0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
[0;0;0;0;0;7]};
eOCOS = { [ [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] ] , . . .
[[0;0;0;0;0;1] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
7 [[0;0;0;0;0;2] ,[0;0;0;0;0;5]] ,...
[[0;0;0;0;0;3] ,[0;0;0;0;0;6]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;7]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;6]]};
VOCOS = v e r t i c e s (vOCOS) ;
12 EOCOS = e d g e s (eOCOS) ;
OCOS = g r a p h s (VOCOS, EOCOS) ;
locationOCOS = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
OCOS=a d d l o c a t i o n t o n o d e s (OCOS, locationOCOS ) ;
OCOS=a d d l o c a t i o n t o e d g e s (OCOS, locationOCOS ) ;
17
vFFP = {[0;0;0;0;0;1] ,[0;0;0;0;0;2]};
eFFP = {[[0;0;0;0;0;1] ,[0;0;0;0;0;2]]};
VFFP = v e r t i c e s ( vFFP ) ;
EFFP = e d g e s ( eFFP ) ;
20.1 Build A Cortex Module 463

22 FFP = g r a p h s (VFFP, EFFP) ;


locationFFP = [ 0 ; 0 ; 0 ; 0 ; 2 ; 0 ] ;
FFP=a d d l o c a t i o n t o n o d e s (FFP , l o c a t i o n F F P ) ;
FFP=a d d l o c a t i o n t o e d g e s (FFP , l o c a t i o n F F P ) ;

27 vTwoThree = {
[0;0;0;0;0;1] ,[0;0;0;0;0;2] ,[0;0;0;0;0;3] ,...
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6]};
eTwoThree = {
[[0;0;0;0;0;5] ,[0;0;0;0;0;1]] ,[[0;0;0;0;0;6] ,[0;0;0;0;0;3]] ,...
32 [[0;0;0;0;0;1] ,[0;0;0;0;0;2]] ,[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;3] ,[0;0;0;0;0;2]] ,[[0;0;0;0;0;3] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;2]]};
VTwoThree = v e r t i c e s ( vTwoThree ) ;
ETwoThree = e d g e s ( eTwoThree ) ;
37 TwoThree = g r a p h s ( VTwoThree , ETwoThree ) ;
locationTwoThree = [ 0 ; 0 ; 0 ; 0 ; 3 ; 0 ] ;
TwoThree=a d d l o c a t i o n t o n o d e s ( TwoThree , l o c a t i o n T w o T h r e e ) ;
TwoThree=a d d l o c a t i o n t o e d g e s ( TwoThree , l o c a t i o n T w o T h r e e ) ;

Then, we add all the individual nodes together into the graph CanOne.

Listing 20.3: Construct CanOne


%
% CanOne S t e p 1
% add t h e n o d e s o f FFP t o OCOS t o make CanOne S t e p 1
%
5 n o d e s = FFP . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
CanOne = addnode (OCOS, n o d e s ( : , 1 ) ) ;
for i = 2: nodesize
CanOne = addnode ( CanOne , n o d e s ( : , i ) ) ;
10 end
%
% CanOne S t e p 2
% add t h e n o d e s o f TwoThree t o CanOne S t e p 1
% t o make CanOne S t e p 2
15 %
n o d e s = TwoThree . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
CanOne = addnode ( CanOne , n o d e s ( : , i ) ) ;
20 end

Now that CanOne has all the nodes, we add the edges. CanOne already has the
OCOS edges, so we just need to add the FFP and Two/Three edges.

Listing 20.4: Add the FFP and Two/Three edges


% Now add a l l t h e e d g e s from t h e c o m p o n e n t s t o CanOne
% add e d g e s from FFP
%
l i n k s = FFP . e ;
5 CanOne = addedgev ( CanOne , l i n k s ) ;
% now add e d g e s from TwoThree
l i n k s = TwoThree . e ;
CanOne = addedgev ( CanOne , l i n k s ) ;

Next, we connect the components.


464 20 Building Brain Models

Fig. 20.2 The last


component of the can circuit
and the can circuit itself. a
Two-three circuit. b A can
circuit

Listing 20.5: Connect the components


l i n k s ={};
2 l i n k s {1} = [ [ 0 ; 0 ; 0 ; 0 ; 2 ; 2 ] ,[0;0;0;0;1;1]];
l i n k s {2} = [ [ 0 ; 0 ; 0 ; 0 ; 3 ; 2 ] ,[0;0;0;0;1;1]];
l i n k s {3} = [ [ 0 ; 0 ; 0 ; 0 ; 1 ; 6 ] ,[0;0;0;0;3;2]];
CanOne = addedgev ( CanOne , links ) ;

We can visualize the can easily now. We build the can, its incidence matrix and then
generate the needed dot file. We then can see the assembled components in a can in
Fig. 20.2b.

20.1.2 Build A Cortical Column

Once we can build a can, we can glue cans together to build columns. We will
default to a three can size for a column. The function is

Listing 20.6: Structure of buildcolumn


function [ CanOne , CanTwo , CanThree , Column ] = b u i l d c o l u m n ( )

end

First, we build three cans and set there location masks: CanOne is [0; 0; 0; 1; ·; ·],
CanTwo is [0; 0; 0; 2; ·; ·] and CanThree is [0; 0; 0; 3; ·; ·].
20.1 Build A Cortex Module 465

Listing 20.7: Building three cans


[ OCOS, FFP , TwoThree , CanOne ] = b u i l d c a n ( ) ;
2 [ OCOS, FFP , TwoThree , CanTwo ] = b u i l d c a n ( ) ;
[ OCOS, FFP , TwoThree , CanThree ] = b u i l d c a n ( ) ;
%
locationCanOne = [0;0;0;1;0;0];
locationCanTwo = [0;0;0;2;0;0];
7 locationCanThree = [ 0 ; 0 ; 0 ; 3 ; 0 ; 0 ] ;
%
CanOne=a d d l o c a t i o n t o n o d e s ( CanOne , l o c a t i o n C a n O n e ) ;
CanOne=a d d l o c a t i o n t o e d g e s ( CanOne , l o c a t i o n C a n O n e ) ;
%
12 CanTwo=a d d l o c a t i o n t o n o d e s ( CanTwo , l o c a t i o n C a n T w o ) ;
CanTwo=a d d l o c a t i o n t o e d g e s ( CanTwo , l o c a t i o n C a n T w o ) ;
%
CanThree=a d d l o c a t i o n t o n o d e s ( CanThree , l o c a t i o n C a n T h r e e ) ;
CanThree=a d d l o c a t i o n t o e d g e s ( CanThree , l o c a t i o n C a n T h r e e ) ;

Next, we add the cans to make a column. This uses the add a node at a time approach
which is inefficient, but makes it clearer what we are doing.

Listing 20.8: Adding the cans to make a column


%
% add CanOne , CanTwo and CanThree n o d e s
% t o make a Column
4 %
% Column S t e p 1
% add t h e n o d e s o f CanTwo t o CanOne t o make Column S t e p 1
%
n o d e s = CanTwo . v ;
9 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
Column = addnode ( CanOne , n o d e s ( : , 1 ) ) ;
for i = 2: nodesize
Column = addnode ( Column , n o d e s ( : , i ) ) ;
end
14 %
n o d e s = CanThree . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
Column = addnode ( Column , n o d e s ( : , 1 ) ) ;
for i = 2: nodesize
19 Column = addnode ( Column , n o d e s ( : , i ) ) ;
end

Now that Column has all the nodes of the three cans, we add in all the can edges.
Since Column contains all of CanOne, we only have to add the edges of CanTwo
and CanThree. This also uses the inefficient add an edge at a time approach.
466 20 Building Brain Models

Listing 20.9: Add the other can edges


%
% Now add a l l t h e e d g e s from t h e c o m p o n e n t s t o Column
% e d g e s from CanOne a r e a l r e a d y t h e r e
%
5 % now add e d g e s from CanTwo
%
l i n k s = CanTwo . e ;
[n , edgesize ] = size ( links ) ;
for i = 1: edgesize
10 Column = addedge ( Column , l i n k s { i } ( : , 1 ) , l i n k s { i } ( : , 2 ) ) ;
end
%
% now add e d g e s from CanThree
%
15 l i n k s = CanThree . e ;
[n , edgesize ] = size ( links ) ;
for i = 1: edgesize
Column = addedge ( Column , l i n k s { i } ( : , 1 ) , l i n k s { i } ( : , 2 ) ) ;
end

We then add the connections between the cans.

Listing 20.10: Add connections between cans


1 %
% add c o n n e c t i n g e d g e s b e t w e e n c o m p o n e n t s
% n e u r o n 1 i n can 2 OCOS t o n e u r o n 1 i n Can 1 FFP
% [ 0 ; 0 ; 0 ; 2 ; 1 ; 1 ] −> [ 0 ; 0 ; 0 ; 1 ; 2 ; 1 ]
% n e u r o n 1 i n can 3 OCOS t o n e u r o n 1 i n Can 2 FFP
6 % [ 0 ; 0 ; 0 ; 3 ; 1 ; 1 ] −> [ 0 ; 0 ; 0 ; 2 ; 2 ; 1 ]
%
% n e u r o n 2 i n Can 1 TwoThree t o n e u r o n 1 i n can 2 OCOS
% [ 0 ; 0 ; 0 ; 1 ; 3 ; 2 ] −> [ 0 ; 0 ; 0 ; 2 ; 1 ; 1 ]
% n e u r o n 2 i n Can 2 TwoThree t o n e u r o n 1 i n can 3 OCOS
11 % [ 0 ; 0 ; 0 ; 2 ; 3 ; 2 ] −> [ 0 ; 0 ; 0 ; 3 ; 1 ; 1 ]
%
Column = addedge ( Column , [ 0 ; 0 ; 0 ; 2 ; 1 ; 1 ] , [ 0 ; 0 ; 0 ; 1 ; 2 ; 1 ] ) ;
Column = addedge ( Column , [ 0 ; 0 ; 0 ; 3 ; 1 ; 1 ] , [ 0 ; 0 ; 0 ; 2 ; 2 ; 1 ] ) ;
Column = addedge ( Column , [ 0 ; 0 ; 0 ; 1 ; 3 ; 2 ] , [ 0 ; 0 ; 0 ; 2 ; 1 ; 1 ] ) ;
16 Column = addedge ( Column , [ 0 ; 0 ; 0 ; 2 ; 3 ; 2 ] , [ 0 ; 0 ; 0 ; 3 ; 1 ; 1 ] ) ;

We can see a typical column in Fig. 20.3. We first generate the dot file and then the
associated graphic as follows:

Listing 20.11: Generate the column dot file and graphic


[ CanOne , CanTwo , CanThree , Column ] = b u i l d c o l u m n ( ) ;
KColumn = i n c i d e n c e ( Column ) ;
incToDot ( KColumn , 6 , 6 , 1 . 0 , );

And generate the graphic in another window with


20.1 Build A Cortex Module 467

Fig. 20.3 A column circuit

Listing 20.12: Generating the graphic with dot


d o t −Tpdf −oBrainColumn . p d f Column . d o t

20.1.3 Build A Cortex Sheet

To build the model of cortex, we then glue together as many cortex columns as we
want. We use two choices: a type of ’single’ which means just one column is used
and a type of ’sheet’ which means we want a cortex model that is a rectangular
sheet of r rows and c columns. We do this with a switch statement. The function is
then

Listing 20.13: The structure of the buildcortex function


f u n c t i o n Corte x = b u i l d c o r t e x ( r , c , t y p e )
%
switch type
case (” s i n g l e ”)
5 % h e r e t h e r e i s o n l y one column
%
% s o o n l y one column name i s n e e d e d
% so j u s t c a l l i t Cortex
%
10 % b u i l d column = C o r t e x
...
case (” sheet ”)
% h e r e t h e c o l u m n s a r e i n a non d e g e n e r a t e sheet ;
% i e a t l e a s t two c o l u m n s .
468 20 Building Brain Models

15 % names s t a r t a t t h e b o t t o m row : for convenience


% s a y r = 2 and c = 3
%
% Column11 , . . . , Column13
% Column21 , . . . , Column23
20 %
...
end
end

The ’single’ case is easy. We build a single column and set its location to
[0; 0; 1; ·; ·; ·].

Listing 20.14: The single case


case (” s i n g l e ”)
% h e r e t h e r e i s o n l y one c o l u m n s
%
% s o o n l y one column name i s n e e d e d
5 % so j u s t c a l l i t Cortex
%
% b u i l d column = C o r t e x
[ c1 , c2 , c2 , C orte x ] = b u i l d c o l u m n ( ) ;
% set location
10 location = [ 0 ; 0 ; 1 ; 0 ; 0 ; 0 ] ;
% add l o c a t i o n t o n o d e s o f C o r t e x
Cortex=a d d l o c a t i o n t o n o d e s ( Cortex , l o c a t i o n ) ;
% add l o c a t i o n t o e d g e s o f C o r t e x
Cortex=a d d l o c a t i o n t o e d g e s ( Cortex , l o c a t i o n ) ;
15 %

In the case of a ’sheet’, we simply build as many columns as the architecture


requires. We build names for the columns in the sheet as we go so we can refer to
them.

Listing 20.15: Constructing the column case


% h e r e t h e c o l u m n s a r e i n a non d e g e n e r a t e s h e e t ;
% i e a t l e a s t two c o l u m n s .
% names s t a r t a t t h e b o t t o m row : f o r c o n v e n i e n c e
% s a y r = 2 and c = 3
5 %
% Column11 , . . . , Column13
% Column21 , . . . , Column23
%
base = ;
10 % loop through sheet
for a = 1: r
for b = 1: c
u = ( a −1)∗ c+b ;
% c r e a t e name s t r i n g
15 name{u} = [ ba se , num2str ( a ) , num2str ( b ) ] ;
% b u i l d column
[ c1 , c2 , c2 , name{u } ] = b u i l d c o l u m n ( ) ;
% set location
location = [0;0; u ; 0 ; 0 ; 0 ] ;
20 % add n o d e s t o column
name{u}=a d d l o c a t i o n t o n o d e s ( name{u } , l o c a t i o n ) ;
% add e d g e s t o column
name{u}= a d d l o c a t i o n t o e d g e s ( name{u } , l o c a t i o n ) ;
end
25 end
20.1 Build A Cortex Module 469

We then build the cortex model by gluing the columns together. We start by adding
nodes.

Listing 20.16: Adding nodes to the cortex module


%
% We h a v e now c o n s t r u c t e d an a r r a y o f r x c
% columns . Now we a s s e m b l e i n t o a c o r t i c a l module .
%
5 % add Column n o d e s i n name{u} t o make t h e C o r t e x module
%
% Cortex Step 1
% add t h e n o d e s o f name{2} t o name{1} t o make Column S t e p 1
%
10 n o d e s = name { 2 } . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
Cortex = addnode ( name { 1 } , n o d e s ( : , 1 ) ) ;
for i = 2: nodesize
C o r t e x = addnode ( Cortex , n o d e s ( : , i ) ) ;
15 end
%
% now add t h e o t h e r c o l u m n s
%
f o r u = 3 : r∗c
20 n o d e s = name{u } . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
Cortex = addnode ( Cortex , n o d e s ( : , i ) ) ;
end
25 end

Now add the edges. The cortex module already has the edges of the first column, so
we only add the edges from column two on. We use the more efficient add a list of
edges here.

Listing 20.17: Adding edges to the cortex module


%
% Now add a l l t h e e d g e s from t h e c o m p o n e n t s t o C o r t e x
% e d g e s from name{1} a r e a l r e a d y t h e r e
%
5 % now add e d g e s from t h e o t h e r c o l u m n s
%
f o r u = 2 : r∗c
l i n k s = name{u } . e ;
Cortex = addedgev ( Cortex , l i n k s ) ;
10 end
%

Then, we add the inter column connections. We do this by building a list of edges to
add called links{}. The code shows the old single edge command commented
out above the addition to the list of edges.
470 20 Building Brain Models

Listing 20.18: Add the inter column connections


% now add i n t e r c o l u m n c o n n e c t i o n s
%
l i n k s = {};
i = 0;
5 for a = 1: r
f o r b = 1 : c −1
u = ( a −1)∗ c+b ;
i = i +1;
%C o r t e x = a d d e d g e ( C o r t e x , [ 0 ; 0 ; u ; 1 ; 3 ; 6 ] , [ 0 ; 0 ; u + 1 ; 1 ; 3 ; 5 ] ) ;
10 links{i} = [[0;0; u ;1;3;6] ,[0;0; u+1;1;3;5]];
end
end
for a = 1: r
f o r b = 1 : c −1
15 u = ( a −1)∗ c+b ;
i = i +1;
%C o r t e x = a d d e d g e ( C o r t e x , [ 0 ; 0 ; u ; 2 ; 3 ; 6 ] , [ 0 ; 0 ; u + 1 ; 2 ; 3 ; 5 ] ) ;
links{i} = [[0;0; u ;2;3;6] ,[0;0; u+1;2;3;5]];
end
20 end
for a = 1: r
f o r b = 1 : c −1
u = ( a −1)∗ c+b ;
i = i +1;
25 %C o r t e x = a d d e d g e ( C o r t e x , [ 0 ; 0 ; u ; 3 ; 3 ; 6 ] , [ 0 ; 0 ; u + 1 ; 3 ; 3 ; 5 ] ) ;
links{i} = [[0;0; u ;3;3;6] ,[0;0; u+1;3;3;5]];
end
end
Cor tex = addedgev ( Cortex , l i n k s ) ;

We can graph a typical 2 × 2 cortex by generating its dot file. We can see this in
Fig. 20.4. We build the cortex, its incidence matrix and generate the dot file with
these lines:

Listing 20.19: The cortex dot file and the graphical image
1 C or t e x = b u i l d c o r t e x ( 2 , 2 , );
KCortex = i n c i d e n c e ( Co r te x ) ;
incToDot ( KCortex , 6 , 6 , 1 , );

And, as usual, generate the graphic in another window with

Listing 20.20: Generating the Cortex figure with dot


d o t −Tpdf −o B r a i n C o r t e x . p d f C or t e x . d o t
20.2 Build A Thalamus Module 471

Fig. 20.4 A cortex circuit

20.2 Build A Thalamus Module

In this sample code, we build a thalamus module by gluing together reversed OCOS
blocks. Ours will have just two, but the pattern is easily repeated. We can build much
better models of thalamus by following the ideas in (Sherman and Guillery 2006)
and their modification in (Sherman and Guillery 2013). But that will be for another
time. Right now we are building a simple placeholder for the thalamus functions in
the full brain model. Since these networks are reversed OCOS’s (ROCOS’s), they
form a new fundamental block. The function is as follows:

Listing 20.21: Structure of the buildthalamus function


f u n c t i o n Thalamus = b u i l d t h a l a m u s ( t h a l a m u s S i z e )

end

First, we build the reverse OCOS node and edge objects.


472 20 Building Brain Models

Listing 20.22: Build the reverse OCOS nodes and edges


%
2 % Use s i z e R e v e r s e OCOS m o d u l e s
%
vROCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
[0;0;0;0;0;7]};
7 eROCOS = {
[[0;0;0;0;0;4] ,[0;0;0;0;0;1]] ,[[0;0;0;0;0;5] ,[0;0;0;0;0;2]] ,...
[[0;0;0;0;0;6] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;7] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;7] ,[0;0;0;0;0;5]] ,...
[[0;0;0;0;0;7] ,[0;0;0;0;0;6]] ,...
12 [[0;0;0;0;0;7] ,[0;0;0;0;0;2]]};
VROCOS = v e r t i c e s (vROCOS) ;
EROCOS = e d g e s (eROCOS) ;

We then build the needed reverse OCOS blocks and assemble into one module. We
construct individual names for the reverse OCOS graph objects and set their location
masks to be [0; 0; 0; 0; 1, ·] to [0; 0; 0; 0; 7, ·]. The case of thalamusSize == 1
is easy. But if there are more than one, we have to assemble more carefully.

Listing 20.23: Building with one reversed OCOS block


if t h a l a m u s S i z e == 1
Thalamus = g r a p h s (VROCOS,EROCOS) ;
location = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
Thalamus = a d d l o c a t i o n t o n o d e s ( Thalamus , l o c a t i o n ) ;
5 Thalamus = a d d l o c a t i o n t o e d g e s ( Thalamus , l o c a t i o n ) ;
else
....
end

In the more than one case, we do this. We create names for each thalamus module.
Then we create thalamus objects and set their addresses.

Listing 20.24: Building with more than one reversed OCOS blocks
b a s e = ’ ROCOS ’ ;
2 f o r i =1: t h a l a m u s S i z e
% c r e a t e name s t r i n g
name{ i } = [ base , num2str ( i ) ] ;
name{ i } = g r a p h s (VROCOS,EROCOS) ;
location = [ 0 ; 0 ; 0 ; 0 ; i ; 0 ] ;
7 name{ i } = a d d l o c a t i o n t o n o d e s ( name{ i } , l o c a t i o n ) ;
name{ i } = a d d l o c a t i o n t o e d g e s ( name{ i } , l o c a t i o n ) ;
end

Next glue all the nodes together.


20.2 Build A Thalamus Module 473

Listing 20.25: Glue the thalamus nodes together


1 %
% Now add a l l t h e n o d e s from t h e c o m p o n e n t s t o Thalamus
% n o d e s from name{1} a r e a l r e a d y t h e r e
%
n o d e s = name { 2 } . v ;
6 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
Thalamus = addnode ( name { 1 } , n o d e s ( : , 1 ) ) ;
for i = 2: nodesize
Thalamus = addnode ( Thalamus , n o d e s ( : , i ) ) ;
end
11 %
% now add t h e o t h e r ROCOS’ s
%
for i = 3: thalamusSize
n o d e s = name{ i } . v ;
16 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
Thalamus = addnode ( Thalamus , n o d e s ( : , i ) ) ;
end
end

Then add the edges. Note our simple thalamus model that consists of multiple reverse
OCOS

Listing 20.26: Add the thalamus edges


%
% Now add a l l t h e e d g e s from t h e c o m p o n e n t s t o Thalamus
% e d g e s from name{1} a r e a l r e a d y t h e r e
%
5 % now add e d g e s from t h e o t h e r c o l u m n s
%
for i = 2: thalamusSize
l i n k s = name{ i } . e ;
Thalamus = addedgev ( Thalamus , l i n k s ) ;
10 end

At this point, we don’t have intermodule connections between the different ROCOS
modules so it truly is a simple model! We then build a thalamus graph as usual and
construct its dot file to visualize it.

Listing 20.27: Build a Thalamus Module with Two Pieces


Thalamus = b u i l d t h a l a m u s ( 2 ) ;
KThalamus = i n c i d e n c e ( Thalamus ) ;
incToDot ( KThalamus , 6 , 6 , 1 , ’ Thalamus . dot ’ ) ;

We see this in Fig. 20.5.


474 20 Building Brain Models

Fig. 20.5 A two copy


ROCOS Thalamus circuit

20.3 Build A MidBrain Module

The midbrain module controls how neurotransmitters are used in our brain model.
Again, this model will be quite simple just to illustrate the points. You can easily
imagine many ways to extend and that will be necessary to build useful models
for various purposes. We build a simple midbrain model by assembling NumDop
dopamine, NumSer serotonin and NumNor norepinephrine neurons into two neuron
sets. The function template is

Listing 20.28: Structure of the buildmidbrain function


f u n c t i o n MidBrain = b u i l d m i d b r a i n (NumDop, NumSer , NumNor)

end

First, we build the neurotransmitter node and edge lists. Each neurotransmitter is
modeled as two nodes with a simple connection between them. First, we build nodes
and edges for each neurotransmitter.

Listing 20.29: Build nodes and edges for each neurotransmitter


%
2 % N e u r o t r a n s m i t t e r Modules
%
% B u i l d Dopamine n e u r o n s
N = NumDop ;
vDopNT = { } ;
20.3 Build A MidBrain Module 475

7 f o r i = 1 :N
vDopNT{ i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] ;
end
f o r i = 1 :N
vDopNT{N+i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ;
12 end
eDopNT = { } ;
f o r i = 1 :N
eDopNT{ i } = [ [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ] ;
end
17
% Build Serotonin neurons
N = NumSer ;
vSerNT = { } ;
f o r i = 1 :N
22 vSerNT{ i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] ;
end
f o r i = 1 :N
vSerNT{N+i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ;
end
27 eSerNT = { } ;
f o r i = 1 :N
eSerNT{ i } = [ [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ] ;
end

32 % Build Norepinephrine neurons


N = NumNor ;
vNorNT = { } ;
f o r i = 1 :N
vNorNT{ i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] ;
37 end
f o r i = 1 :N
vNorNT{N+i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ;
end
eNorNT = { } ;
42 f o r i = 1 :N
eNorNT{ i } = [ [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ] ;
end

Then, we build vertices and edges objects and then their corresponding neurotrans-
mitter objects.

Listing 20.30: Build neurotransmitter objects


1 VDopNT = v e r t i c e s (vDopNT) ;
EDopNT = e d g e s (eDopNT) ;
Dopamine = g r a p h s (VDopNT, EDopNT) ;
location = [0;0;0;0;1;0];
Dopamine = a d d l o c a t i o n t o n o d e s ( Dopamine , l o c a t i o n ) ;
6 Dopamine = a d d l o c a t i o n t o e d g e s ( Dopamine , l o c a t i o n ) ;

VSerNT = v e r t i c e s ( vSerNT ) ;
ESerNT = e d g e s ( eSerNT ) ;
S e r o t o n i n = g r a p h s ( VSerNT , ESerNT ) ;
11 location = [ 0 ; 0 ; 0 ; 0 ; 2 ; 0 ] ;
Serotonin = addlocationtonodes ( Serotonin , l o c a t i o n ) ;
Serotonin = addlocationtoedges ( Serotonin , l o c a t i o n ) ;
476 20 Building Brain Models

VNorNT = v e r t i c e s ( vNorNT ) ;
16 ENorNT = e d g e s ( eNorNT ) ;
N o r e p i n e p h r i n e = g r a p h s (VNorNT, ENorNT) ;
location = [ 0 ; 0 ; 0 ; 0 ; 3 ; 0 ] ;
Norepinephrine = ad d l o ca t i o n to no d e s ( Norepinephrine , l o c a t i o n ) ;
Norepinephrine = a d d l o c a t i o n t o e d g e s ( Norepinephrine , l o c a t i o n ) ;

Then, we finish by gluing the nodes together into a MidBrain object and add the
edges.

Listing 20.31: Glue neurotransmitter modules into the midbrain


nodes = S e r o t o n i n . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
MidBrain = addnode ( Dopamine , n o d e s ( : , 1 ) ) ;
for i = 2: nodesize
5 MidBrain = addnode ( MidBrain , n o d e s ( : , i ) ) ;
end

nodes = Norepinephrine . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
10 for i = 1: nodesize
MidBrain = addnode ( MidBrain , n o d e s ( : , i ) ) ;
end

l i n k s = Serotonin . e ;
15 d i s p ( ’ add serotonin edges ’ ) ;
MidBrain = addedgev ( MidBrain , l i n k s ) ;

l i n k s = Norepinephrine . e ;
d i s p ( ’ add n o r e p i n e p h r i n e edges ’ ) ;
20 MidBrain = addedgev ( MidBrain , l i n k s ) ;

We build the MidBrain module and its dot file as usual.

Listing 20.32: Generate the midbrain and its the dot file
MidBrain = b u i l d m i d b r a i n ( 2 , 2 , 2 ) ;
KMidBrain = i n c i d e n c e ( MidBrain ) ;
incToDot ( KMidBrain , 6 , 6 , 1 . 0 , ’ MidBrain . dot ’ ) ;

We then plot the image and we can see the midbrain object in Fig. 20.6.

Fig. 20.6 A MidBrain


circuit
20.4 Building the Brain Model 477

20.4 Building the Brain Model

Our simple brain model will consist of two sensory modules, an associative cortex,
a motor cortex, a thalamus model, a midbrain model and a cerebellum model. We
are not modeling memory or any motor output functions. We can add these later but
this will be a nice simple model with a fair bit of structure. Note the model is full
of feedback and input and output nodes are scattered all through the model. So all
of our hard work at trying to understand how to build input to output maps using
derivative based tools and Hebbian techniques as well as the conversion of a feedback
graph into a lagged feedforward are very relevant. Later, we will briefly introduce
another training technique that we think will be useful in efficiently building models
of cognitive function and dysfunction. The generic model then looks like what we
show in Fig. 20.7. We are not showing input/output models here.
The template we use to build the brain model is given below. It will return how
many nodes and edges are in the model as well as the graph that represents our model
of this brain.

Listing 20.33: Structure of the buildbrain function


f u n c t i o n [ N o d e S i z e s , E d g e S i z e s , B r a i n ] = b u i l d b r a i n (NumDop, NumSer ,
NumNor)
2
end

where in the call to buildbrain we can choose how many of each neurotransmitter
we want to use and we return a vector of size for neurons and edges so that we can
build a better visualization of this graph. There are seven modules here and each has its
own number of neurons and edges. So the global neuron and edge numbering scheme

Fig. 20.7 A simple brain SensoryII


model not showing input and
output connections. The EIIA
edge connections are labeled SensoryI Assoc
ESIA
E αβ where α and β are the
EMSII
abbreviations for the various
modules EAM EMSI
EMA
EAT
ETA
EMM
Motor Midbrain
E TMtr EMT
ETM
Thalamus
EMtrC
E TC

Cerebellum
478 20 Building Brain Models

can be broken up into pieces relevant to each module. The vectors NodeSizes and
EdgeSizes contain this information. In the original incToDot code, we just drew
a graph simply using the global node and edge numbers. We will now automate the
construction of the dot file differently. We will call each module its own cluster in the
digraph and allow for us to set individual colors for each module and both internal
links and intermodule links. Now let’s discuss how we build our brain model. This
will still not have an input or output section. First, we initialize the size counters.

Listing 20.34: Initialize counters


NodeSizes = [ ] ;
EdgeSizes = [ ] ;

We then build the SensoryOne cortical module and set its mask to [0; 1; ·; ·; ·; ·].
This allows us to set the node size and edge of the first module in the counters.

Listing 20.35: Build sensory cortex module one


d i s p ( ’ Build SensoryOne ’ ) ;
SensoryOne = b u i l d c o r t e x ( 1 , 1 , ’ single ’ ) ;
location = [ 0 ; 1 ; 0 ; 0 ; 0 ; 0 ] ;
SensoryOne=a d d l o c a t i o n t o n o d e s ( SensoryOne , l o c a t i o n ) ;
5 SensoryOne=a d d l o c a t i o n t o e d g e s ( SensoryOne , l o c a t i o n ) ;

N1 = l e n g t h ( SensoryOne . v ) ;
E1 = l e n g t h ( SensoryOne . e ) ;
N o d e S i z e s ( 1 ) = N1 ;
10 E d g e S i z e s ( 1 ) = E1 ;
d i s p ( [ ’ Neurons 1 to ’ , num2str (N1) , ’ are Sensory One ’ ] ) ;
d i s p ( [ ’ Links 1 to ’ , num2str ( E1 ) , ’ are Sensory One ’ ] ) ;

We then build the SensoryTwo cortical module, set its mask to [0; 2; ·; ·; ·; ·] and
set counters.

Listing 20.36: Build sensory cortex module two


d i s p ( ’ Build Sen s o r y T w o ’ ) ;
SensoryTwo = b u i l d c o r t e x ( 1 , 1 , ’ single ’ ) ;
location = [ 0 ; 2 ; 0 ; 0 ; 0 ; 0 ] ;
SensoryTwo=a d d l o c a t i o n t o n o d e s ( SensoryTwo , l o c a t i o n ) ;
5 SensoryTwo=a d d l o c a t i o n t o e d g e s ( SensoryTwo , l o c a t i o n ) ;

N2 = l e n g t h ( SensoryTwo . v ) ;
E2 = l e n g t h ( SensoryTwo . e ) ;
N o d e S i z e s ( 2 ) = N2 ;
10 E d g e S i z e s ( 2 ) = E2 ;
I = N1+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2) , ’ are Sensory Two ’
]) ;
d i s p ( [ ’ Links ’ , num2str ( E1+1) , ’ to ’ , num2str ( E1+E2 ) , ’ are Sensory Two ’
]) ;

We will model the associative cortex as a 2 × 2 sheet and build the object
AssociativeCortex with mask address [0; 3; ·; ·; ·; ·].
20.4 Building the Brain Model 479

Listing 20.37: Build the associative cortex module


d i s p ( ’ Build A s s o c i a t i v e C o r t e x ’ ) ;
A s s o c i a t i v e C o r t e x = b u i l d c o r t e x ( 2 , 2 , ’ sheet ’ ) ;
location = [ 0 ; 3 ; 0 ; 0 ; 0 ; 0 ] ;
A s s o c i a t i v e C o r t e x=a d d l o c a t i o n t o n o d e s ( A s s o c i a t i v e C o r t e x , l o c a t i o n ) ;
5 A s s o c i a t i v e C o r t e x=a d d l o c a t i o n t o e d g e s ( A s s o c i a t i v e C o r t e x , l o c a t i o n ) ;

N3 = l e n g t h ( A s s o c i a t i v e C o r t e x . v ) ;
E3 = l e n g t h ( A s s o c i a t i v e C o r t e x . e ) ;
N o d e S i z e s ( 3 ) = N3 ;
10 E d g e S i z e s ( 3 ) = E3 ;
I = N1+N2+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3) , ’
are Associative Cortex ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+1) , ’ to ’ , num2str ( E1+E2+E3 ) , ’
15 are Associative Cortex ’ ] ) ;

We choose to model the motor cortex as a single cortical column and set its mask to
[0; 4; ·; ·; ·; ·].

Listing 20.38: Build the motor cortex module


d i s p ( ’ Build Mot o r C o r t e x ’ ) ;
MotorCortex = b u i l d c o r t e x ( 1 , 1 , ’ single ’ ) ;
location = [ 0 ; 4 ; 0 ; 0 ; 0 ; 0 ] ;
MotorCortex=a d d l o c a t i o n t o n o d e s ( MotorCortex , l o c a t i o n ) ;
5 MotorCortex=a d d l o c a t i o n t o e d g e s ( MotorCortex , l o c a t i o n ) ;

N4 = l e n g t h ( MotorCortex . v ) ;
E4 = l e n g t h ( MotorCortex . e ) ;
N o d e S i z e s ( 4 ) = N4 ;
10 E d g e S i z e s ( 4 ) = E4 ;
I = N1+N2+N3+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4) , ’
are Motor Cortex ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+1) , ’ to ’ , num2str ( E1+E2+E3+E4 ) , ’
15 are Associati v e Cortex ’ ] ) ;

We then build the thalamus model, Thalamus with mask [0; 5; ·; ·; ·; ·].

Listing 20.39: Build the thalamus module


d i s p ( ’ Build Thalamus ’ ) ;
Thalamus = b u i l d t h a l a m u s ( 2 ) ;
location = [ 0 ; 5 ; 0 ; 0 ; 0 ; 0 ] ;
Thalamus=a d d l o c a t i o n t o n o d e s ( Thalamus , l o c a t i o n ) ;
5 Thalamus=a d d l o c a t i o n t o e d g e s ( Thalamus , l o c a t i o n ) ;

N5 = l e n g t h ( Thalamus . v ) ;
E5 = l e n g t h ( Thalamus . e ) ;
N o d e S i z e s ( 5 ) = N5 ;
10 E d g e S i z e s ( 5 ) = E5 ;
I = N1+N2+N3+N4+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4+N5) , ’
are Thalamus ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+E4+1) , ’ to ’ , num2str ( E1+E2+E3+E4+E5 ) ,

15 are Thalamus ’ ] ) ;

Next, we model the midbrain as we discussed with 25 each of three different neuro-
transmitters. We set the mask now to [0; 6; ·; ·; ·; ·].
480 20 Building Brain Models

Listing 20.40: Build the midbrain module


d i s p ( ’ Build MidBrain ’ ) ;
MidBrain = b u i l d m i d b r a i n (NumDop, NumSer , NumNor) ;
location = [ 0 ; 6 ; 0 ; 0 ; 0 ; 0 ] ;
MidBrain=a d d l o c a t i o n t o n o d e s ( MidBrain , l o c a t i o n ) ;
5 MidBrain=a d d l o c a t i o n t o e d g e s ( MidBrain , l o c a t i o n ) ;

N6 = l e n g t h ( MidBrain . v ) ;
E6 = l e n g t h ( MidBrain . e ) ;
N o d e S i z e s ( 6 ) = N6 ;
10 E d g e S i z e s ( 6 ) = E6 ;
I = N1+N2+N3+N4+N5+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4+N5+N6) , ’
are MidBrain ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+E4+E5+1) , ’ to ’ , num2str ( E1+E2+E3+E4+
E5+E6 ) , ’
15 are MidBrain ’ ] ) ;

Finally, we model the cerebellum as a single cortical column for now and set its mask
to [0; 7; ·; ·; ·; ·].

Listing 20.41: Build the cerebellum module


d i s p ( ’ Build Cerebellum ’ ) ;
C e r e b e l l u m = b u i l d c o r t e x ( 1 , 1 , ’ single ’ ) ;
location = [ 0 ; 7 ; 0 ; 0 ; 0 ; 0 ] ;
C e r e b e l l u m=a d d l o c a t i o n t o n o d e s ( Cerebellum , l o c a t i o n ) ;
5 C e r e b e l l u m=a d d l o c a t i o n t o e d g e s ( Cerebellum , l o c a t i o n ) ;

N7 = l e n g t h ( C e r e b e l l u m . v ) ;
E7 = l e n g t h ( C e r e b e l l u m . e ) ;
N o d e S i z e s ( 7 ) = N7 ;
10 E d g e S i z e s ( 7 ) = E7 ;
I = N1+N2+N3+N4+N5+N6+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4+N5+N6+N7) , ’
are Cerebellu m ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+E4+E5+E6+1) , ’ to ’ , num2str ( E1+E2+E3+
E4+E5+E6+E7 ) , ’
15 are Cerebellu m ’ ] ) ;

We can then build a simple brain model object, Brain. We begin by gluing together
all the module nodes.

Listing 20.42: Glue brain modules together


% Brain St ep 1
% add t h e n o d e s o f SensoryTwo t o SensoryOne make B r a i n S t e p 1
%
n o d e s = SensoryTwo . v ;
5 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
B r a i n = addnode ( SensoryOne , n o d e s ( : , 1 ) ) ;
for i = 2: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
20.4 Building the Brain Model 481

10 %
% Brain St ep 2
% add t h e n o d e s o f A s s o c i a t i v e t o B r a i n
%
nodes = A s s o c i a t i v e C o r t e x . v ;
15 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
%
20 % Brain St ep 3
% add t h e n o d e s o f M o t o r C o r t e x t o B r a i n
%
n o d e s = MotorCortex . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
25 for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
% Brain St ep 4
% add t h e n o d e s o f Thalamus t o B r a i n
30 %
n o d e s = Thalamus . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
35 end
% Brain St ep 5
% add t h e n o d e s o f MidBrain t o B r a i n
%
n o d e s = MidBrain . v ;
40 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
% Brain St ep 6
45 % add t h e n o d e s o f C e r e b e l l u m t o B r a i n
%
nodes = Cerebellum . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
50 B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end

Then we add in the edges. We already have the edges from Sensory Cortex One.

Listing 20.43: Add the brain module edges


%
%
% now add e d g e s from c o m p o n e n t s t o B r a i n
4 %
% Now add a l l t h e e d g e s from SensoryTwo
%
l i n k s = SensoryTwo . e ;
[n , edgesize ] = size ( links ) ;
9 for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i } ( : , 2 ) ) ;
end
%
% Now add a l l t h e e d g e s from A s s o c i a t i v e C o r t e x
14 %
links = AssociativeCortex . e ;
[n , edgesize ] = size ( links ) ;
for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i } ( : , 2 ) ) ;
482 20 Building Brain Models

19 end
%
% Now add a l l t h e e d g e s from M o t o r C o r t e x
%
l i n k s = MotorCortex . e ;
24 [n , edgesize ] = size ( links ) ;
for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end
%
29 % Now add a l l t h e e d g e s from Thalamus
%
l i n k s = Thalamus . e ;
[n , edgesize ] = size ( links ) ;
for i = 1: edgesize
34 B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end
%
% Now add a l l t h e e d g e s from M i d b r a i n
%
39 l i n k s = MidBrain . e ;
[n , edgesize ] = size ( links ) ;
for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end
44 %
% Now add a l l t h e e d g e s from C e r e b e l l u m
%
l i n k s = Cerebellum . e ;
[n , edgesize ] = size ( links ) ;
49 for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end

The next part is harder to write down as we have to carefully add all the edges between
neurons in various components.
We are going to setup the connections now for all the modules as shown in
Fig. 20.7. There are lot of them and in general, this is pretty time consuming and
intellectually demanding to setup. In the text below, we go through all the steps. We
can show you an intermediate step in Fig. 20.8.
This figure only shows the E S I A and E S I I A connections which we setup below.
We are connecting the FFP circuits in all three cans of each sensory cortex to the
Two/Three circuits in the associative cortex model. Our associative cortex consists
of four columns each having three cans. We connect the sensory cortex to neuron
5 of the Two/Three circuit in each column and each can in the associative cortex.
So for example, we connect [0; 1; 1; 3; 2; 1] to [0; 3; i; j; 3; 5] for i running from
1 to 4 (these are the associative cortex column indices) and for j running from 1 to
3 (these are the can indices). This gives 4 × 3 such connections. Now the address
[0; 1; 1; 3; 2; 1] is for sensory cortex one, column one (there is only one column
in each sensory cortex) and node 1 in the FFP circuit (that is the number 2 in the
address’s fifth column and the number 1 in the sixth column). We have to do that
for the other two column in each sensory cortex. So we have a total of 3 × 12 such
20.4 Building the Brain Model 483

Fig. 20.8 The two sensory


cortex modules and the
associative cortex module
with intermodule links

connections for each sensory cortex and two sensory cortexes giving a total of 72
connections to code. It is hard to say how to do this. We have experimented with
graphic user interfaces which allow us to connect nodes by clicking on the pre node
and the post node to establish the link, but if you look at Fig. 20.8, you can see the
complexity of the drawing starts making this very hard to do. We still find it easier
to code this sort of stuff, but then our minds might be different from yours! So feel
free to experiment yourself and find a convenient way to set these links up you can
live with!
Now, let’s get started with all the setup. We start with the simplest interconnections.
First, we setup 9 links from sensory cortex one to associative cortex.

Listing 20.44: Connections sensory cortex one and associative cortex


%
% add c o n n e c t i o n s from s e n s o r y c o r t e x one t o a s s o c i a t i v e c o r t e x
% Sensory Cortex address [0;1;0; −; −; −]
4 % S e n s o r y C o r t e x One h a s 1 Column s o 3 c a n s
% Column address [0;1;1; −; −; −]
% Can address [0;1;1;1; −; −]
% Can address [0;0;1;2; −; −]
% Can address [0;0;1;3; −; −]
9 % n e u r o n 1 o f f f p can 3 i n s e n s o r y one t o r i g h t n e u r o n s
% o f e a c h 2/3 c i r c u i t i n a s s o c i a t i v e c o r t e x
% can 1 , can 2 , can 3
l i n k s {1} = [ [ 0 ; 1 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {2} = [ [ 0 ; 1 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
14 l i n k s {3} = [ [ 0 ; 1 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
% n e u r o n 1 o f f f p can 2 i n s e n s o r y c o r t e x one t o r i g h t n e u r o n s
% o f e a c h 2/3 c i r c u i t i n a s s o c i a t i v e c o r t e x
% can 1 , can 2 , can 3
links 4 = [ [ 0 ; 1 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
484 20 Building Brain Models

19 l i n k s {5} = [ [ 0 ; 1 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {6} = [ [ 0 ; 1 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
% n e u r o n 1 o f f f p can 1 i n s e n s o r y one t o r i g h t neurons
% o f e a c h 2/3 c i r c u i t i n a s s o c i a t i v e c o r t e x
% can 1 , can 2 , can 3
24 l i n k s {7} = [ [ 0 ; 1 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {8} = [ [ 0 ; 1 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {9} = [ [ 0 ; 1 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;

Second, we encode in the 9 links from sensory cortex two to associative cortex.

Listing 20.45: Connections sensory cortex two and associative cortex


%
% n e u r o n 1 o f f f p can 3 i n s e n s o r y two t o r i g h t neurons
% o f e a c h 2/3 c i r c u i t i n a s s o c i a t i v e c o r t e x
4 % can 1 , can 2 , can 3
l i n k s {10} = [ [ 0 ; 2 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {11} = [ [ 0 ; 2 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {12} = [ [ 0 ; 2 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
% n e u r o n 1 o f f f p can 2 i n s e n s o r y two t o r i g h t neurons
9 % o f e a c h 2/3 c i r c u i t i n a s s o c i a t i v e c o r t e x
% can 1 , can 2 , can 3
l i n k s {13} = [ [ 0 ; 2 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {14} = [ [ 0 ; 2 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {15} = [ [ 0 ; 2 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
14 % n e u r o n 1 o f f f p can 1 i n s e n s o r y two t o r i g h t neurons
% o f e a c h 2/3 c i r c u i t i n a s s o c i a t i v e c o r t e x
% can 1 , can 2 , can 3
l i n k s {16} = [ [ 0 ; 2 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {17} = [ [ 0 ; 2 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
19 l i n k s {18} = [ [ 0 ; 2 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;

We then assign 9 links from associative cortex to motor cortex.

Listing 20.46: Connections associative cortex and motor cortex


1 %
% n e u r o n 1 o f f f p can 3 i n a s s o c i a t i v e c o r t e x to right neurons
% o f e a c h 2/3 c i r c u i t i n motor c o r t e x
% can 1 , can 2 , can 3
l i n k s {19} = [ [ 0 ; 3 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 1 ; 3 ; 5 ] ] ;
6 l i n k s {20} = [ [ 0 ; 3 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {21} = [ [ 0 ; 3 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
% n e u r o n 1 o f f f p can 2 i n a s s o c i a t i v e c o r t e x to right neurons
% o f e a c h 2/3 c i r c u i t i n motor c o r t e x
% can 1 , can 2 , can 3
11 l i n k s {22} = [ [ 0 ; 3 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {23} = [ [ 0 ; 3 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {24} = [ [ 0 ; 3 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
% n e u r o n 1 o f f f p can 1 i n a s s o c i a t i v e c o r t e x to right neurons
% o f e a c h 2/3 c i r c u i t i n motor c o r t e x
16 % can 1 , can 2 , can 3
l i n k s {25} = [ [ 0 ; 3 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {26} = [ [ 0 ; 3 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {27} = [ [ 0 ; 3 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;

We then add 10 links from thalamus to sensory cortex one and two.
20.4 Building the Brain Model 485

Listing 20.47: Connections thalamus and sensory cortex one


1 %
% add t h a l a m u s t o s e n s o r y c o r t e x one can 1 OCOS
l i n k s {28} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 4 ] , [ 0 ; 1 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {29} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 5 ] , [ 0 ; 1 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {30} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 6 ] , [ 0 ; 1 ; 1 ; 1 ; 1 ; 7 ] ] ;
6 l i n k s {31} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 1 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {32} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 1 ; 1 ; 1 ; 1 ; 2 ] ] ;
% add t h a l a m u s t o s e n s o r y c o r t e x two can 1 OCOS
l i n k s {33} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 4 ] , [ 0 ; 2 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {34} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 5 ] , [ 0 ; 2 ; 1 ; 1 ; 1 ; 7 ] ] ;
11 l i n k s {35} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 6 ] , [ 0 ; 2 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {36} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 2 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {37} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 2 ; 1 ; 1 ; 1 ; 2 ] ] ;

Next are 10 more links from thalamus to associative cortex and motor cortex.

Listing 20.48: Connections thalamus to associative and motor cortex


% add t h a l a m u s t o a s s o c i a t i v e c o r t e x can 1 OCOS
2 l i n k s {38} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 4 ] , [ 0 ; 3 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {39} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 5 ] , [ 0 ; 3 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {40} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 6 ] , [ 0 ; 3 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {41} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 3 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {42} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 3 ; 1 ; 1 ; 1 ; 2 ] ] ;
7 % add t h a l a m u s t o motor c o r t e x can 1 OCOS
l i n k s {43} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 4 ] , [ 0 ; 4 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {44} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 5 ] , [ 0 ; 4 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {45} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 6 ] , [ 0 ; 4 ; 1 ; 1 ; 1 ; 7 ] ] ;
l i n k s {46} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 4 ; 1 ; 1 ; 1 ; 7 ] ] ;
12 l i n k s {47} = [ [ 0 ; 5 ; 0 ; 0 ; 1 ; 2 ] , [ 0 ; 4 ; 1 ; 1 ; 1 ; 2 ] ] ;

Finally, we add connections from thalamus to the cerebellum.

Listing 20.49: Connections Thalamus to cerebellum


%
% now add c o n n e c t i o n s from t h a l a m u s t o cerebellum
3 %
l i n k s {48} = [[0;5;0;0;1;4] ,[0;7;1;3;2;1]];
l i n k s {49} = [[0;5;0;0;1;4] ,[0;7;1;2;2;1]];
l i n k s {50} = [[0;5;0;0;1;4] ,[0;7;1;1;2;1]];

Then add connections from cerebellum to motor cortex.

Listing 20.50: Connections cerebellum to motor cortex

%
% now add c o n n e c t i o n s from c e r e b e l l u m t o motor c o r t e x
4 %
% add c e r e b e l l u m t o motor c o r t e x two / t h r e e can 3
l i n k s {51} = [ [ 0 ; 7 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
l i n k s {52} = [ [ 0 ; 7 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
l i n k s {53} = [ [ 0 ; 7 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
486 20 Building Brain Models

9 % add c e r e b e l l u m t o motor c o r t e x two / t h r e e can 2


l i n k s {54} = [ [ 0 ; 7 ; 1 ; 3 ; 2 ; 1 ] ,[0;4;1;2;3;5]];
l i n k s {55} = [ [ 0 ; 7 ; 1 ; 2 ; 2 ; 1 ] ,[0;4;1;2;3;5]];
l i n k s {56} = [ [ 0 ; 7 ; 1 ; 1 ; 2 ; 1 ] ,[0;4;1;2;3;5]];
% add c e r e b e l l u m t o motor c o r t e x two / t h r e e can 1
14 l i n k s {57} = [ [ 0 ; 7 ; 1 ; 3 ; 2 ; 1 ] ,[0;4;1;1;3;5]];
l i n k s {58} = [ [ 0 ; 7 ; 1 ; 2 ; 2 ; 1 ] ,[0;4;1;1;3;5]];
l i n k s {59} = [ [ 0 ; 7 ; 1 ; 1 ; 2 ; 1 ] ,[0;4;1;1;3;5]];

This gives us a link list of 59 elements we add to the Brain object.

Listing 20.51: Add intermodule connections to brain


B r a i n = addedgev ( Brain , l i n k s ) ;

Next, we add in the dopamine links. This is done is a loop and for ease of under-
standing, we set up all the dopamine links in the link list doplinks. First, we set
neurotransmitter sizes.

Listing 20.52: Set neurotransmitter sizes


s i z e D o p a m i n e = NumDop ;
s i z e S e r o t o n i n = NumSer ;
s i z e N o r e p i n e p h r i n e = NumNor ;

Then, we set the dopamine links.

Listing 20.53: Set the dopamine connections


%
2 % add dopamine c o n n e c t i o n s
%
s i z e D o p a m i n e = NumDop ;
s i z e S e r o t o n i n = NumSer ;
s i z e N o r e p i n e p h r i n e = NumNor ;
7 f o r i =1: s i z e D o p a m i n e
%N = 51+( i −1) ∗ 4 5 ;
N = ( i −1) ∗ 4 5 ;
In = [ 0 ; 6 ; 0 ; 0 ; 1 ; i ] ;
% s e n s o r y c o r t e x one
12 d o p l i n k s {N+1} = [ In , [ 0 ; 1 ; 1 ; 1 ; 3 ; 5 ] ] ;
d o p l i n k s {N+2} = [ In , [ 0 ; 1 ; 1 ; 1 ; 3 ; 6 ] ] ;
d o p l i n k s {N+3} = [ In , [ 0 ; 1 ; 1 ; 2 ; 3 ; 5 ] ] ;
d o p l i n k s {N+4} = [ In , [ 0 ; 1 ; 1 ; 2 ; 3 ; 6 ] ] ;
d o p l i n k s {N+5} = [ In , [ 0 ; 1 ; 1 ; 3 ; 3 ; 5 ] ] ;
17 d o p l i n k s {N+6} = [ In , [ 0 ; 1 ; 1 ; 3 ; 3 ; 6 ] ] ;
% s e n s o r y c o r t e x two
d o p l i n k s {N+7} = [ In , [ 0 ; 2 ; 1 ; 1 ; 3 ; 5 ] ] ;
d o p l i n k s {N+8} = [ In , [ 0 ; 2 ; 1 ; 1 ; 3 ; 6 ] ] ;
d o p l i n k s {N+9} = [ In , [ 0 ; 2 ; 1 ; 2 ; 3 ; 5 ] ] ;
22 d o p l i n k s {N+10} = [ In , [ 0 ; 2 ; 1 ; 2 ; 3 ; 6 ] ] ;
d o p l i n k s {N+11} = [ In , [ 0 ; 2 ; 1 ; 3 ; 3 ; 5 ] ] ;
d o p l i n k s {N+12} = [ In , [ 0 ; 2 ; 1 ; 3 ; 3 ; 6 ] ] ;
% associative cortex
d o p l i n k s {N+13} = [ In , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
20.4 Building the Brain Model 487

27 d o p l i n k s {N+14} = [ In , [ 0 ; 3 ; 1 ; 1 ; 3 ; 6 ] ] ;
d o p l i n k s {N+15} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
d o p l i n k s {N+16} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 6 ] ] ;
d o p l i n k s {N+17} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
d o p l i n k s {N+18} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 6 ] ] ;
32 % motor c o r t e x
d o p l i n k s {N+19} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 5 ] ] ;
d o p l i n k s {N+20} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 6 ] ] ;
d o p l i n k s {N+21} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 5 ] ] ;
d o p l i n k s {N+22} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 6 ] ] ;
37 d o p l i n k s {N+23} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
d o p l i n k s {N+24} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 6 ] ] ;
% thalamus
for j = 1:7
u = ( j −1) ∗ 3 ;
42 Out1 = [ 0 ; 5 ; 0 ; 0 ; j ; 1 ] ;
Out2 = [ 0 ; 5 ; 0 ; 0 ; j ; 2 ] ;
Out3 = [ 0 ; 5 ; 0 ; 0 ; j ; 3 ] ;
d o p l i n k s {N+24+u+1} = [ In , Out1 ] ;
d o p l i n k s {N+24+u+2} = [ In , Out2 ] ;
47 d o p l i n k s {N+24+u+3} = [ In , Out3 ] ;
end
end
B r a i n = addedgev ( Brain , d o p l i n k s ) ;
[ a , b ] = size ( doplinks ) ;

We then handle serotonin links with the list serlinks.

Listing 20.54: Set the serotonin connections


f o r i =1: s i z e S e r o t o n i n
N = ( i −1) ∗ 4 5 ;
In = [ 0 ; 6 ; 0 ; 0 ; 2 ; i ] ;
% s e n s o r y c o r t e x one
5 s e r l i n k s {N+1} = [ In , [ 0 ; 1 ; 1 ; 1 ; 3 ; 5 ] ] ;
s e r l i n k s {N+2} = [ In , [ 0 ; 1 ; 1 ; 1 ; 3 ; 6 ] ] ;
s e r l i n k s {N+3} = [ In , [ 0 ; 1 ; 1 ; 2 ; 3 ; 5 ] ] ;
s e r l i n k s {N+4} = [ In , [ 0 ; 1 ; 1 ; 2 ; 3 ; 6 ] ] ;
s e r l i n k s {N+5} = [ In , [ 0 ; 1 ; 1 ; 3 ; 3 ; 5 ] ] ;
10 s e r l i n k s {N+6} = [ In , [ 0 ; 1 ; 1 ; 3 ; 3 ; 6 ] ] ;
% s e n s o r y c o r t e x two
s e r l i n k s {N+7} = [ In , [ 0 ; 2 ; 1 ; 1 ; 3 ; 5 ] ] ;
s e r l i n k s {N+8} = [ In , [ 0 ; 2 ; 1 ; 1 ; 3 ; 6 ] ] ;
s e r l i n k s {N+9} = [ In , [ 0 ; 2 ; 1 ; 2 ; 3 ; 5 ] ] ;
15 s e r l i n k s {N+10} = [ In , [ 0 ; 2 ; 1 ; 2 ; 3 ; 6 ] ] ;
s e r l i n k s {N+11} = [ In , [ 0 ; 2 ; 1 ; 3 ; 3 ; 5 ] ] ;
s e r l i n k s {N+12} = [ In , [ 0 ; 2 ; 1 ; 3 ; 3 ; 6 ] ] ;
% associative cortex
s e r l i n k s {N+13} = [ In , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
20 s e r l i n k s {N+14} = [ In , [ 0 ; 3 ; 1 ; 1 ; 3 ; 6 ] ] ;
s e r l i n k s {N+15} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
s e r l i n k s {N+16} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 6 ] ] ;
s e r l i n k s {N+17} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
s e r l i n k s {N+18} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 6 ] ] ;
25 % motor c o r t e x
s e r l i n k s {N+19} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 5 ] ] ;
s e r l i n k s {N+20} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 6 ] ] ;
s e r l i n k s {N+21} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 5 ] ] ;
s e r l i n k s {N+22} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 6 ] ] ;
488 20 Building Brain Models

30 s e r l i n k s {N+23} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
s e r l i n k s {N+24} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 6 ] ] ;
% thalamus
for j = 1:7
u = ( j −1) ∗ 3 ;
35 Out1 = [ 0 ; 5 ; 0 ; 0 ; j ; 1 ] ;
Out2 = [ 0 ; 5 ; 0 ; 0 ; j ; 2 ] ;
Out3 = [ 0 ; 5 ; 0 ; 0 ; j ; 3 ] ;
s e r l i n k s {N+24+u+1} = [ In , Out1 ] ;
s e r l i n k s {N+24+u+2} = [ In , Out2 ] ;
40 s e r l i n k s {N+24+u+3} = [ In , Out3 ] ;
end
end
B r a i n = addedgev ( Brain , s e r l i n k s ) ;

Finally, we handle norepinephrine links with the list norlinks.

Listing 20.55: Set the norepinephrine connections


%
% add n o r e p i n e p h r i n e c o n n e c t i o n s
%
f o r i =1: s i z e N o r e p i n e p h r i n e
5 N = ( i −1) ∗ 4 5 ;
In = [ 0 ; 6 ; 0 ; 0 ; 3 ; i ] ;
% s e n s o r y c o r t e x one
n o r l i n k s {N+1} = [ In , [ 0 ; 1 ; 1 ; 1 ; 3 ; 5 ] ] ;
n o r l i n k s {N+2} = [ In , [ 0 ; 1 ; 1 ; 1 ; 3 ; 6 ] ] ;
10 n o r l i n k s {N+3} = [ In , [ 0 ; 1 ; 1 ; 2 ; 3 ; 5 ] ] ;
n o r l i n k s {N+4} = [ In , [ 0 ; 1 ; 1 ; 2 ; 3 ; 6 ] ] ;
n o r l i n k s {N+5} = [ In , [ 0 ; 1 ; 1 ; 3 ; 3 ; 5 ] ] ;
n o r l i n k s {N+6} = [ In , [ 0 ; 1 ; 1 ; 3 ; 3 ; 6 ] ] ;
% s e n s o r y c o r t e x two
15 n o r l i n k s {N+7} = [ In , [ 0 ; 2 ; 1 ; 1 ; 3 ; 5 ] ] ;
n o r l i n k s {N+8} = [ In , [ 0 ; 2 ; 1 ; 1 ; 3 ; 6 ] ] ;
n o r l i n k s {N+9} = [ In , [ 0 ; 2 ; 1 ; 2 ; 3 ; 5 ] ] ;
n o r l i n k s {N+10} = [ In , [ 0 ; 2 ; 1 ; 2 ; 3 ; 6 ] ] ;
n o r l i n k s {N+11} = [ In , [ 0 ; 2 ; 1 ; 3 ; 3 ; 5 ] ] ;
20 n o r l i n k s {N+12} = [ In , [ 0 ; 2 ; 1 ; 3 ; 3 ; 6 ] ] ;
% associative cortex
n o r l i n k s {N+13} = [ In , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
n o r l i n k s {N+14} = [ In , [ 0 ; 3 ; 1 ; 1 ; 3 ; 6 ] ] ;
n o r l i n k s {N+15} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
25 n o r l i n k s {N+16} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 6 ] ] ;
n o r l i n k s {N+17} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
n o r l i n k s {N+18} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 6 ] ] ;
% motor c o r t e x
n o r l i n k s {N+19} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 5 ] ] ;
30 n o r l i n k s {N+20} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 6 ] ] ;
n o r l i n k s {N+21} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 5 ] ] ;
n o r l i n k s {N+22} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 6 ] ] ;
n o r l i n k s {N+23} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
n o r l i n k s {N+24} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 6 ] ] ;
35 % thalamus
for j = 1:7
u = ( j −1) ∗ 3 ;
Out1 = [ 0 ; 5 ; 0 ; 0 ; j ; 1 ] ;
Out2 = [ 0 ; 5 ; 0 ; 0 ; j ; 2 ] ;
20.4 Building the Brain Model 489

n o r l i n k s {N+24+u+1} = [ In , Out1 ] ;
n o r l i n k s {N+24+u+2} = [ In , Out2 ] ;
n o r l i n k s {N+24+u+3} = [ In , Out3 ] ;
end
45 end
B r a i n = addedgev ( Brain , n o r l i n k s ) ;

We can now build a small brain model.

Listing 20.56: Build a simple brain model


[ NodeSizes , EdgeSizes , Brain ] = b u i l d b r a i n ( 2 , 2 , 2 ) ;
B u i l d SensoryOne
Neurons 1 t o 45 a r e S e n s o r y One
4 L i n k s 1 t o 58 a r e S e n s o r y One
B u i l d SensoryTwo
Neurons 46 t o 90 a r e S e n s o r y Two
L i n k s 59 t o 116 a r e S e n s o r y Two
Build AssociativeCortex
9 Neurons 91 t o 270 a r e A s s o c i a t i v e Cor te x
L i n k s 117 t o 354 a r e A s s o c i a t i v e Co rte x
B u i l d MotorCortex
Neurons 271 t o 315 a r e Motor Cortex
L i n k s 355 t o 412 a r e Motor C ortex
14 B u i l d Thalamus
Neurons 316 t o 329 a r e Thalamus
L i n k s 413 t o 426 a r e Thalamus
B u i l d MidBrain
have dopamine e d g e s
19 add s e r o t o n i n e d g e s
add n o r e p i n e p h r i n e e d g e s
Neurons 330 t o 341 a r e MidBrain
L i n k s 427 t o 432 a r e MidBrain
Build Cerebellum
24 Neurons 342 t o 386 a r e C e r e b e l l u m
L i n k s 433 t o 490 a r e C e r e b e l l u m
Edges 491 t o 873 a r e I n t e r m o d u l e c o n n e c t i o n s

Then get the incidence matrix and the graph’s dot file.

Listing 20.57: Generate brain incidence matrix and dot file


KBrain = i n c i d e n c e ( B r a i n ) ;
incToDot ( KBrain , 6 , 6 , 1 . 0 , ’ Brain . dot ’ ) ;

We next generate the graphic.


490 20 Building Brain Models

Fig. 20.9 A Brain model

Listing 20.58: Generate the brain graphic


d o t −Tpdf −o B r a i n B r a i n . p d f B r a i n . d o t

This generates the graph shown in Fig. 20.9. However, this visualization is less than
ideal. It would be better to be able to color portions of the graph differently as well
as organize modules into graph sub clusters. This can be done, although it is tedious.
We have written such code but how we do it depends a great deal on what type of
brain model we are building so it is very hard to automate it. This kind of code is
available on request; we will be handle to share our pain with you! Also, there are
20.4 Building the Brain Model 491

standard ways to generate the many files you can use to make a simple movie from
your simulation. The dot code is setup to allow color changes based on link weight
intensities and so forth. We have also not included that code here. Again, it is hard
to make it generic; we tend to write such code as needed for a project at hand. Still,
contact us if you want to talk about it!

References

S.M. Sherman, R. Guillery, Exploring the Thalamus and Its Role in Cortical Function (The MIT
Press, Cambridge, 2006)
S.M. Sherman, R. Guillery, Functional Connections of Cortical Areas: A New View of the Thalamus
(The MIT Press, Cambridge, 2013)
Part VII
Models of Cognition Dysfunction
Chapter 21
Models of Cognitive Dysfunction

To finish this text, let’s outline a blueprint for the creation of a model of cognitive
function and/or dysfunction which is relatively small computationally.

21.1 Cognitive Modeling

We are interested in doing this for several important reasons. First, in order to make
judicious statements about both cognitive dysfunction and policies that ameliorate
the problem, we require that there is a functioning model of normal brain function.
This is very hard to do, yet if we attempt it and develop a reasonable approximation,
we can then use the normal brain model to help us understand what happens if
neural modules and their interconnections change due to disease and trauma. Also,
the computational capability of stand alone robotic platforms is always less than
desired, so a model which can perhaps add higher level functions such as emotions
to decision making algorithms would be quite useful. Hence, we will deliberately
search out appropriate approximations to neural computation and deploy them in a
robust, yet small scale scalable architecture in the quest for our models. However,
the real reason is that in order to make progress on many different fronts in cognitive
research, we need an overarching model of neural computation and it engenders high
level things like associative learning and cognition. In our minds, this quest is quite
similar to understanding how to answer difficult questions such as “Which pathway
to cancer is dominant in a colon cancer model?” or “How do we understand the
spread of altruism throughout a population?” and so on. In our earlier sections, we
went through the arguments that can be brought to bear to bring insight into these
important questions and our intent is similar here. How can we phrase and answer
questions of this sort:
• “How is associative learning done?” Answering this sheds light on fundamental
problems in deep learning and how to build better algorithms to accomplish it. It
also is at the heart of understanding shape, handwriting and many other things.

© Springer Science+Business Media Singapore 2016 495


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_21
496 21 Models of Cognitive Dysfunction

• “What is a good painting?” How do we decide a painting is good in some sense


as a composition?
• “How do we decide a musical composition is good?” There are many ways notes
can be strung together. We interpret some combinations as more pleasing than
others. Why?
• “What are emotions?” Are emotions by products of sufficiently complicated neural
wiring? Are they pragmatic ways to reduce search spaces in neural optimization
algorithms?
• “What is a normal brain?” This is similar to asking how we know in a simulation
if a simulated host dies—a question we asked in the West Nile Virus model of
infection. We want to explain survival curve data but explaining our simulation
results requires we understand how to flag a host’s history as indicative of death
or life at the end of the simulation. If we could design a model of a normal brain,
it would open the door to lesion studies and much more.
• “How do we generate good music or good art?” This is part of associative learning
really. If we could encode a bunch of examples of good music or art into a brain
architecture, how could this be used to take start sequences of notes to generate
new compositions that are interesting and labeled as good? Note this requires the
model can supply emotional labellings!
• “How do we understand the need for brain asymmetry?” If an organism has known
asymmetry in two halves of their brain, the question is why? What is the advantage
of that?
There are many more questions we are interested in. Note building simulations
at various levels of detail for how a neuron generates its axonal pulse, how second
messenger triggers influence that output and so forth do not answer these types of
questions. We will always make model error anyway so we think we should focus
on the bare bones of the functional brain model. First, all brain models connect
computational nodes (typically neurons) to other computational nodes using edges
between the nodes. This architecture of nodes plus edges is really quite fixed. The
details of the nodal and edge processing can change a lot, but this connectivity
architecture remains the same. This means each brain model has an associated graph
of nodes with edges. The edges have a direction, so the graph must be a directed graph.
At the minimum, we need to generate directed graph models of brain function with at
least cortical, thalamus and midbrain modules having norepinephrine, serotonin and
dopamine neurotransmitter modulation. We have already discussed the full details of
how to generate neutral and emotionally labeled music and painting data in this text
in Chaps. 11 and 12. So let’s use it to discuss how we might build a model of normal
brain function. We won’t actually build this yet as that is a very detailed process
and it will be left for the next volume. However, in this text all the tools have been
developed and we are ready to begin the building of such models both for human
targets and a variety of small brained animals.
But let’s get back to modeling human cognitive dysfunction using the music
and painting data sets. Recall from Lang et al. (1998), it was shown that people
respond to emotionally tagged or affective images in a semi-quantitative manner.
21.1 Cognitive Modeling 497

Human volunteers were shown various images and their physiological responses
were recorded in two ways. One was a skin galvanic response and the other a fMRI
parameter. The data shows that the null responses are associated with images that
have no emotional tag. Further, the images cleanly map to distinct 2D locations in
skin response and fMRI space, the emotional grid, when the emotional contents of
the images differ. Hence, we will assume that if a database of music and paintings
were separated into states of anger, sadness, happiness and neutrality, we would
see a similar separation of response. We have designed emotionally labeled data
using a grammatical approach to composition using an approach to assembling data
known as a Würfelspiel matrix which consists of P rows and three columns. In
the first column are placed the nouns; in the third column, are placed the objects;
and in the second column, are placed the verbs. Each sentence fragment Sijk =
Nouni + Verbj + Object k constitutes a composed phrase and there are P3 possible
combinations.
⎡ ⎤
Noun0 Verb0 Object 0
⎢ Noun1 Verb1 Object 1 ⎥
⎢ ⎥
A=⎢ .. .. .. ⎥ (21.1)
⎣ . . . ⎦
NounP−1 VerbP−1 Object P−1

We have used this technique in our development of emotionally tagged music and
painting data in Chaps. 11 and 12. For auditory data, we use musical fragments where
the nouns become opening phrases, the verbs transitions and the objects, the closing.
For the visual data, we use painting compositions made from foreground (the noun),
midground (the verb) and background (the object). Hence, for neutral data plus three
emotionally labeled sets of grammars, we have 4P3 input sequences for our model of
visual and auditory cortex. We assume each musical and painting data presentation
corresponds to a two dimensional emotional grid location which we assume has nice
separation properties. We thus have a set of data which can be used to imprint a brain
model with correct or normal emotional responses.
Each of our directed graphs also has node and edge functions associated with it
and these functions are time dependent as what they do depends on first and sec-
ond messenger triggers, the hardware structure of the output neuron on so forth. We
therefore model the neural circuitry of a brain using a directed graph architecture
consisting of computational nodes N and edge functions E which mediate the trans-
fer of information between two nodes. Hence, if Ni and Nj are two computational
nodes, then Ei→j would be the corresponding edge function that handles informa-
tion transfer from node Ni and node Nj . For our purposes, we will assume here that
the neural circuitry architecture we describe is fixed, although dynamic architec-
tures can be handled as sequence of directed graphs. We organize the directed graph
using interactions between neural modules (visual cortex, thalamus etc.) which are
themselves subgraphs of the entire circuit. Once we have chosen a direct graph to
represent our neural circuitry, note the addition of a new neural module is easily
handled by adding it and its connections to other modules as a subgraph addition.
498 21 Models of Cognitive Dysfunction

Hence, at a given level of complexity, if we have the graph G(N, E) that encodes the
connectivity we wish to model, then the addition of a new module or modules simply
generates a new graph G  (N  , E ) for which there are straightforward equations for
explaining how G relates to G which are easy to implement. The update equations
for a given node then are given as an input/output pair. For the node Ni , let yi and Yi
denote the input and output from the node, respectively. Then we have

yi (t + 1) = Ii + Ej→i (t) Yj (t)
j∈B(i)
Yi (t + 1) = σi (t)(yi (t)).

where Ii is a possible external input, B(i) is the list of nodes which connect to the
input side of node Ni and σi (t) is the function which processes the inputs to the
node into outputs. This processing function is mutable over time t because sec-
ond messenger systems are altering how information is processing each time tick.
Hence, our model consists of a graph G which captures the connectivity or topol-
ogy of the brain model on top of which is laid the instructions for information
processing via the time dependent node and edge processing functions. A sim-
ple look at edge processing shows the nodal output which is perhaps an action
potential which is transferred without change to a synaptic connection where it
initiates a spike in Ca++ ions which results in neurotransmitter release. The effi-
cacy of this release depends on many things, but we can focus on four: ru (i, j),
the rate of reuptake of neurotransmitter in the connection between node Ni and
node Nj ; the neurotransmitter is destroyed via an appropriate oxidase at the rate
rd (i, j); the rate of neurotransmitter release, rr (i, j) and the density of the neurotrans-
mitter receptor, nd (i, j). The triple (ru (i, j), rd (i, j), rr (i, j)) ≡ T (i, j) determines a
net increase or decrease of neurotransmitter concentration between the two nodes:
rr (i, j) − ru (i, j) − rd (i, j) ≡ rnet (i, j). The efficacy of a connection between nodes is
then proportional to the product rnet (i, j)×nd (i, j). Hence, each triple is a determining
signature for a given neurotransmitter and the effectiveness of the neurotransmitter
is proportional to the new neurotransmitter flow times the available receptor density.
A very simple version of this is to simply assign the value of the edge processing
function Ei→j to be the weight Wi,j as is standard in a simple connectionist archi-
tecture. Of course, it is more complicated as our graphs allow feedback easily by
simply defining the appropriate edge connections. All of these parameters are also
time dependent, so we can add a (t) to all of the above to indicate that, but we have
not done so as we do not want too much clutter. We have worked out how to use
approximations to nodal computation in Chaps. 6 and 9 and more details of how these
approximations can be used in asynchronous computation environments are given
in Peterson (2015).
21.1 Cognitive Modeling 499

Table 21.1 Evaluation and Hebbian update algorithms


(a) Evaluation (b) Hebbian Update
for(i = 0; i < N ; i + +) { for(i = 0; i < N ; i + +) {
if (i ∈ U) for (j ∈ B(i)) {
y i = xi + j∈B(i) Ej→i fj yp = fi Ej→i
else if (yp )
y i = j∈B(i) Ej→i fj Ej→i = ζ Ej→i
fi = σ i (y i , p) }
} }

21.1.1 Training Algorithms

Recall, we assume there is a vector function f which assigns an output to each node.
In the simplest case, this vector function takes the form of f (i) = σi and, of course,
these computations can be time dependent. Given an arbitrary input vector x, the DG
computes node outputs via an iterative process. We let U denote the nodes which
receive external inputs and B(i), the backward set of node Ni . The edge from node
Ni to node Nj is assigned an edge function Ei→j which could be implemented as
assigning a scalar value to this edge. Then, the evaluation is shown in Table 21.1a,
where N is the number of nodes in the DG and the ith node function is given by the
function σ i which processes the current node input yi .
We can also adjust parameters in the DG using the classic idea of a Hebbian
update which would be implemented as shown in Table 21.1b: choose a tolerance
 and at each node in the backward set of the post neuron i, update the edge value
by the factor ζ if the product of the post neuron node value fi and the edge value
Ej→i exceeds . The update algorithm then consists of the paired operations: sweep
through the DG to do an evaluation and then use both Hebbian updates and the graph
flow equations to adjust the edge values.

21.2 Information

21.2.1 Laplacian Updates

For a given directed graph, the gradient of f is defined by ∇ f = K T f and the


Laplacian, by ∇ 2 f . We posit the flow of information through G is given by the graph
based partial differential equation ∇ 2 f − α ∂f∂t
− β f = −I, where I is the external
input. This is similar to a standard cable equation. We interpret the ∂f
∂t
using a standard
500 21 Models of Cognitive Dysfunction

forward difference f which is defined at each iteration time t by f (0) = 0 and


otherwise f (t) = f (t) − f (t − 1). This gives, using a finite difference for the ∂f∂t
term, Eq. 21.2, where we define the finite difference fn (t) as fn (t + 1) − fn (t).

∇ 2 f − α f − β f = −I. (21.2)

The update equation is then

KK T f − α f − β f = −I.

For iteration t, we then have the update equation

KK T f (t + 1) − α f (t) − β f (t + 1) = −I(t + 1).

which gives

KK T − (α + β)Id f (t + 1) + β f (t) = −I(t + 1).

where Id is the appropriate identity matrix. Let Hα,β denote the operator
KK T − (α + β)Id. We thus have the iterative equation

Hα,β f (t + 1) = −β f (t) − I(t + 1).

or since Hα,β is invertible,

f (t + 1) = −H−1 −1
α,β β f (t) − Hα,β I(t + 1).

For convenience, let (t + 1) = −H−1


α,β I(t + 1). Then, we have

f (t + 1) = (t + 1).

Now let’s switch to a more typical nodal processing formulation. Recall, each node
N i has an input yi (t) which is processed by the node using the function σi as Y i (t) =
σi (yi (t)). The node processing could also depend on parameters, but here we will
only assume an offset and a gain is used in each computation. Thus, the node values
f are equivalent to the node outputs Y. This allows us to write the update as

Y i (t + 1) = σi (yi (t)) = i (t + 1).

or

yi (t) = σi−1 (i (t + 1)) .


21.2 Information 501

For simplicity, let’s assume a sigmoid nodal computation for each node:

x−o
σi (x) = 0.5 1 + tanh .
g

where o and g are traditional offset and gain parameters for the computation. It is
straightforward to find

gi 1 + i (t + 1)
σi−1 (i (t + 1)) = Oi + ln
2 1 − i (t + 1)

where Oi is the offset to node i and g i is the gain. We also know the graph evaluation
equations for yi then give

 gi 1 + i (t + 1)
yi (t) = Ej→i (t) Yj (t) = Oi + ln
j∈B(i)
2 1 − i (t + 1)

where we don’t need to add a nodal input as that is taken care of in the external input
I. Pick the maximum Ej→i (t) Yj (t) component and label its index jM . Then set

gi 1 + i (t + 1)
EjM →i (t) YjM (t) = Oi + ln
2 1 − i (t + 1)

and solve for EjM →i (t) to find

1 g 1 + i (t + 1)
EjM →i (t) = Oi + i ln
YjM (t) 2 1 − i (t + 1)

We could also decide to do this update for a larger subset of the indices contributing to
the nodal output value in which case several edge weights could be updated for each
iteration. For example, this update could be undertaken for all synaptic edges that
meet a standard Hebbian update tolerance criterion. Implementing this update model
into Matlab is quite similar to what we have already shown in the Hebbian case.

21.2.2 Module Updates

Given the update strategies, above it is worth mentioning that once we have a given
graph structure for our model, we will probably want to update it by adding module
interconnections, additional modules and so forth. We find this is our normal pathway
to building a better model as our understanding of the neural computation we need
502 21 Models of Cognitive Dysfunction

to approximate changes as we study and we then need to add additional complexity


to our model. However, just to illustrate what can happen, let’s just look at what
happens when we add a new subgraph to an existing graph. We want to see what
happens in the update strategies. Hence, we assume we have a model G∞ (N1 , E1 )
with incidence matrix K1 . We then add to that model the subgraph G∈ (N2 , E2 ) with
incidence matrix K2 . For convenience of exposition, let’s assume the first graphs
incidence matrix is 9 × 12 and the second graphs incidence matrix is 20 × 30. When
we combine these neural modules, we must decide on how to connect the neurons of
G1 to the neurons of G2 . The combined graph has N1 + N2 nodes and E1 + E2 edges
plus the additional edges from the connections between the modules. The combined
graph has an incidence matrix K and the connections between G1 to G2 will give rise
sub matrices in K of the form
⎡ ⎤
K1 O1 C
⎢ (9 × 12) (9 × 30) (9 × N) ⎥
K=⎢ ⎣


O2 K2 D
(20 × 12) (20 × 30) (20 × N)

where N is the number of intermodule edges we have added. The new incidence
matrix is thus 29 × (42 + N). The matrix K T is then
⎡ ⎤
K T1 OT2
⎢ (12 × 9) (12 × 20) ⎥
⎢ ⎥
⎢ ⎥
⎢ O T
K ⎥
KT − ⎢ 1 2

⎢ (30 × 9) (30 × 20) ⎥
⎢ ⎥
⎣ CT DT ⎦
(N × 9) (N × 20)

and the new Laplacian is


⎡ ⎤
K T1 OT2
⎡ ⎤ ⎢ (12 × 9) (12 × 20) ⎥
K1 O1 C ⎢ ⎥
⎢ (9 × 12) (9 × 30) (9 × N) ⎥ ⎢ ⎥
⎢ OT1 K2 ⎥
KK T = ⎢


⎦ ⎢ ⎥
O2 K2 D ⎢ (30 × 9) (30 × 20) ⎥
⎢ ⎥
(20 × 12) (20 × 30) (20 × N) ⎣ CT DT ⎦
(N × 9) (N × 20)

After multiplying, we have (multiplies involving our various zero matrices O1 and
O2 vanish)
21.2 Information 503

⎡ ⎤
K 1 K T1 + CC T CDT
⎢ (9 × 9) (9 × 20) ⎥
KK T = ⎢


DC T
K 1 K T1 + CC T ⎦
(20 × 9) (20 × 20)

This can be rewritten as


  
K 1 K T1 O CC T O O CDT
KK =T
+ +
O K 2 K T2 O DDT DC T O

for zero sub matrices of appropriate sizes in each matrix all labeled O. Recall the
Laplacian updating algorithm is given by

∂f
KK T f − α − β f = −I.
∂t

which can be written in finite difference form using f (t) for the value of the node
values at time t as

KK T f (t) − α f (t) − β f (t) = −I(t).

where I(t) is the external input at time t and f (0) = 0 and otherwise f (t) =
f (t) − f (t − 1). Now divide the node vectors f and I into the components fI , fII ,
II and III to denote the nodes for subgraph G1 and G2 , respectively. Then, we see
we can write the Laplacian for the full graph in terms of the components used to
assemble the graph from its modules and module interconnections.
    
fI K 1 K T1 O fI CC T O fI
KK T = +
fII O K 2 K T2 fII O DDT fII
T
 
O CD fI
+
DC T O fII

This can be rewritten as


       
fI K 1 K T1 + CC T fI CDT fII
KK T
= +
fII K 2 K T2 + DDT fII DC T fI

Now substitute this into the update equation to find


           
K 1 K T1 + CC T fI CDT fII fI fI II
+ −α −β −α =−
K 2 K T2 + DDT fII DC T fI fII fII III
504 21 Models of Cognitive Dysfunction

This can be then further rewritten as follows:

K 1 K T1 fI − α fI − βfI + CC T fI + CDT fII = −II


K 2 K T2 fII − α fII − βfII + DDT fII + DC T fI = −III

Hence, the terms CC T + CDT fI , DDT + DC T fII , CDT fII and DC T fI represent
the mixing of signals between modules. We see if we have an existing model we
can use these update equations to add to an existing graph a new graph. Thus, we
can take a trained brain module and add a new cortical sub module and so forth and
to some extent retain the training effort we have already undertaken. Now add time
indices and expand the difference terms to get

K 1 K T1 fIt − α ( fIt − fIt−1 ) − βfIt + CC T fIt + CDT fIIt = −IIt


K 2 K T2 fIIt − α ( fIIt − fIIt−1 ) − βfIIt + DDT fIIt + DC T fIt = −IIIt

If we let Lf = KK T f − αf − βf , for the graph with incidence matrix K, we can


rewrite the update as

LI fIt + CC T fIt + CDT fIIt = αfIt−1 − IIt


LII fIIt + DDT fIIt + DC T fIt = α fIIt−1 − IIIt

21.2.3 Submodule Two Training

If we wanted to train a portion of the cortex to match sensory data, we can organize
our brain graph model as two modules. The first graph is all of the model except
the portion of the cortex we want to train. Hence, the two module update equations
derived in the previous section are applicable. Let’s assume the cortical submodule we
are interested in is the second graph and let EII be the orthonormal basis consisting of
the eigenvectors for the matrix LII with λII the corresponding vector of eigenvalues.
Then using the representation of vectors with respect to the basis EII , we have
 tj j  tj j
fIIt = j fII E II and IIIt = j III E II and so

 tj j
 tj j
 t−1,j j
 tj j
fII LII E II + fII DDT E II + DC T fIt = fII α E II − III E II .
j j j j

j j j
However, we also know LII E II = λII E II . Thus, we obtain

 tj j j
 tj j
 t−1,j j
 tj j
fII λII E II + fII DDT E II + DC T fIt = fII α E II − III E II .
j j j j
21.2 Information 505

j j
Define the new vectors FII = DDT E II and gII
t
= DC T fIt . Substituting these into
our expression, we find
 tj j t−1,j tj

j
 tj j
fII λII − αfII + III E II + fII FII + gII
t
= 0.
j j

j j  jk j
We can expand FII and gII t
also in the basis EII giving FII = FII E II and
 tj j
k
gII = j gII E II . From these expansions, we find
t

 tj j t−1,j tj tj

j
 tj jk
fII λII − αfII + III + gII E II + k
fII FII EII = 0.
j j k

Now reorganize the double sum to write


 
 tj j t−1,j tj tj
 kj j
fII λII −α fII + III + gII + fIItk FII E II = 0.
j k

This immediately implies that for all indices j, we must have

tj j t−1,j tj tj
 kj
fII λII − α fII + III + gII + fIItk FII = 0.
k

This can be rewritten as


tj j t−1,j tj tj
fII λII − α fII + III + gII + < fIIt FII
k
> = 0.

t,j
We now have a set of equations for the unknowns fII . We can rewrite this as a matrix
equation. Let II denote the diagonal matrix
⎡ 1 ⎤
λII 0 0 ... 0
⎢0 λ1II 0 ... 0 ⎥
⎢ ⎥
II = ⎢ . .. .. .. .. ⎥
⎣ .. . . . . ⎦
0 ... 0 0 λPII

where P is the number of nodes in the second module. Further, let FII be the matrix
⎡ ⎤
11
FII 21
FII ... P1
FII
⎢ F12 22
... P2 ⎥
⎢ II FII FII ⎥
FIIT = ⎢
⎢ .. .. ..


⎣ . . . ⎦
1P
FII 2P
FII ... PP
FII
506 21 Models of Cognitive Dysfunction

Then, we can rewrite the update equation more succinctly as

II + FIIT fIIt + IIIt + gII


t
− α fIIt−1 = 0.

t,t−1
Finally, let DII = − IIIt + gIIt
− α fIIt−1 . Then we have II + F T fIIt =
t,t−1
DII . It then follows that we can update via

−1 t,t−1
fIIt = II + F T DII = HtII

where we know the matrix inverse exists because CC T is invertible. This is the same
form as the calculations we did in Sect. 21.2.1. Let’s assume we have a sigmoidal
nodal computation σi (x) = 0.5 1 + tanh x−o
g
and so fIIti = σi (ytiII ) = ξII
ti
or
ytiII = σi−1 ξII
ti
where o and g are traditional offset and gain parameters for the
computation. We know the graph evaluation equations for ytiII then give

 tj gi 1 + ξII
ti
ytiII = Ej→i (t) fII = Oi + ln
j∈B(i)
2 1 − ξII
ti

tj
Pick the maximum Ej→i (t) fII component and label its index jM . Then set

gi 1 + ξII
ti
EjM →i (t) fIItM = Oi + ln
2 1 − ξII
ti

and solve for EjM →i (t) to find

1 gi 1 + ξII
ti
EjM →i (t) = Oi + ln
fIItM 2 1 − ξII
ti

Thus, we have a method for updating a portion of the parameters that influence each
nodal output. We combine this with the standard Hebbian updates with inhibition to
complete the training algorithm.
We can also turn this into recursive equation by replacing critical components at
time t with time t − 1 to give
tj j t−1,j tj t−1,j
fII λII = α fII − III − gII + < fIIt−1 FII
k
>.

This gives the update

tj 1  t−1,j tj t−1,j

fII = j
α fII − III − gII + < fIIt−1 FII
k
> .
λII
21.2 Information 507

which allows us to alter the parameters of the jth nodal function using its inverse
(fII )−1 as we did earlier.
j

21.2.4 Submodule One Training

To update the first graph in the pair, we use the update equation

LI fIt + CC T fIt + CDT fIIt = αfIt−1 − IIt

The arguments we present next are quite similar to the ones presented for submodule
two training; hence, we will be briefer in our exposition. The first graph is all of the
model except the portion of the cortex we want to train. Let EI be the orthonormal
basis consisting of the eigenvectors for the matrix LI with λI the corresponding
vector of eigenvalues. Then using the representation of vectors with respect to the
 tj j  tj j  tj j
basis EI , we have fIt = j fI E I and IIt = j II E I and so j fI LI E I +
 tj j  t−1,j j  tj j
j fI CC E I + CD fII = α E I − j II E I . However, we also know
T T t
j fI
j j j
LI E I = λI E I . Thus, we obtain
 tj j j
 tj j
 t−1,j j
 tj j
f I λI E I + fI CC T E I + CDT fIIt = fI αEI − II E I .
j j j j

j j
Define the new vectors FI = CC T E I and gIt = CDT fIIt . Substituting these into our
expression, we find
 tj j t−1,j tj

j
 tj j
fI λI − αfI + II EI + fI FI + gIt = 0.
j j

j j  jk j
We can expand FI and gIt also in the basis EI giving  FI = k FI E I and 
 tj j  tj j t−1,j tj tj
gIt = j gI E I . From these expansions, we find j fI λI − αfI + II + gI
j   tj jk
EI + k fI FI EI = 0. Now reorganize
k
the double sum to write
  j
 tk kj  j
tj j t−1,j tj tj
j f I λI − α f I + II + gI + k fI FI E I = 0. This immediately implies
tj j t−1,j tj tj  kj
that for all indices j, we must have fI λI − α fI + II + gI + k fItk FI = 0.
tj j t−1,j tj tj
This can be rewritten as fI λI − α fI + II + gI + < fI FI >= 0. We now have a
t k
t,j
set of equations for the unknowns fI . We can rewrite this as a matrix equation. Let
I denote the diagonal matrix diag(I ) similar to what we did before. Further, let
ij
FI be the matrix (FI ). Then, we can rewrite the update equation more succinctly
as

I + FIT fIt + IIt + gIt − α fIt−1 = 0.


508 21 Models of Cognitive Dysfunction

Finally, let DIt,t−1 = − IIt + gIt − α fIt−1 . Then we have I + FIT fIt = DIt,t−1 .
It then follows that we can update via
−1
fIt = I + FIT DIt,t−1 = HtI

where we know the matrix inverse exists because CC T is invertible. Again, assuming
we have a sigmoidal nodal computation, we find fIti = σi (ytiI ) = ξIti or ytiI = σi−1 ξIti
where o and g are traditional offset and gain parameters for the computation. We know
the graph evaluation equations for yi then give

 tj gi 1 + ξIti
ytiI = Ej→i (t) fI = Oi + ln
j∈B(i)
2 1 − ξIti

tj
Pick the maximum Ej→i (t) fI component and label its index jM . Then set

gi 1 + ξIti
EjM →i (t) fItM = Oi + ln
2 1 − ξIti

and solve for EjM →i (t) to find

1 gi 1 + ξIti
EjM →i (t) = Oi + ln
fItM 2 1 − ξIti

Thus, we have a method for updating a portion of the parameters that influence each
nodal output. We combine this with the standard Hebbian updates with inhibition to
complete the training algorithm.
We can also turn this into recursive equation by replacing critical components at
tj j t−1,j tj t−1,j
 fI λI = α f I
time t with time t − 1 to give − II − gI + <  fI FI >. This
t−1 k

tj t−1,j tj t−1,j
gives the update fI = 1
j α fI − II − gI + < fIt−1 FIk > which allows us
λI
to alter the parameters of the jth nodal function using its inverse ( fI )−1 as we did
j

earlier. Thus we have a mechanism to update the submodule of the rest of the neural
model we are interested in using the orthonormal basis its subgraph provides.

21.3 A Normal Brain Model

Let’s address the issue of developing a normal brain model for lesion studies. Indeed,
we can also look at the difference between a normal and a dysfunctional brain model
as a way to determine a palliative strategy that might restore normalcy. Let’s focus
on three neurotransmitters here: dopamine, serotonin and norepinephrine. It is well
know that disturbances in neurotransmitter signalling and processing are related to
21.3 A Normal Brain Model 509

mood disorders. A good review of this is given in Russo and Nestler (2013). In fact,
reward—seeking behavior circuitry is probably common across many animals as
described in Barron et al. (2010). Our model here will be a first attempt at quan-
tifying these interactions; simplified but with some explanatory insight. Each has
an associated triple (ru , rd , rr ) ≡ T which we label with a superscript for each
neurotransmitter. Hence, T D , T S and T N are the respective triples for our three neu-
rotransmitters dopamine, serotonin and norepinephrine, Consider a simple graph
model G(N, E) as shown in Fig. 21.1.
The auditory and visual training data are given in four emotional labellings; neu-
tral, sad, happy and angry. We assume that each emotional state corresponds to
neurotransmitter triples (TND , TNS , TNN ) (neutral), (TSD , TSS , TSN ) (sad), (THD , THS , THN )
(happy) and (TAD , TAS , TAN ) (angry), In addition, we assume that these triple states can
be different in the visual, auditory and associative cortex and hence, we add an addi-
D S N
tional subscript label to denote that. We let (Tα,β , Tα,β , Tα,β ) be the triple for cortex
β (β = 0 is visual cortex, β = 1 is auditory cortex and β = 2 is associative cortex)
and emotional state α (α = 0 is neutral, α = 1 is sad, α = 2 is happy and α = 3 is
angry). Hence, we are using our emotionally labeled data as a way of constructing
a simplistic normal brain model. Of course, it is much more complicated than this.
Emotion in general is something that is hard to quantify and even somewhat difficult
to find good measures for what state a person is in. A good review of that problem is
in Mauss and Robinson (2009). Cortical activity that is asymmetric probably plays
a role here, see Harmon-Jones et al. (2010), and what is nice about our modeling
choices is that we have the tools to model such asymmetry using the graph models.
However, our first model is much simpler.
The graph encodes the information about emotional states in both the nodal
processing and the edge processing functions. Let’s start with auditory neutral data
and consider the graph shown in Fig. 21.2.
The neurotransmitter triples can exist in four states: neutral, sad, happy and
γ
angry which are indicated by the toggle α in the triples Tα,β where γ denotes the
neurotransmitter choice (γ = 0 is dopamine, γ = 1 is serotonin and γ = 2 is
γ γ
norepinephrine). For auditory neutral data, there is a choice of triple T0,1 and T0,2
for each neurotransmitter which corresponds to how the brain labels this data as
emotionally neutral. Hence, we are identifying a collection of six triples or eighteen
numbers with neutral auditory data. This is a vector in 18 which we will call V 0,1,2 .
We need to do this for the other emotional states which gives us three additional
vectors V 1,1,2 , V 2,1,2 and V 3,1,2 . Thus, emotional processing for emotionally labeled
auditory data requires us to set four vectors in 18 . We need to do the same thing for
the processing of emotionally labeled visual data which will give us the four vectors
V 0,0,2 , V 1,0,2 , V 2,0,2 and V 3,0,2 . To process the data for both types of sensory cortex
thus requires eight vectors in 18 . Choose 8 orthonormal vectors in 18 to correspond
to these states. To train the full graph G(N, E) to understand neutral auditory data, it is
a question of what internal outputs of the graph should be fixed or clamped and which
should be allowed to alter. For neutral auditory data, we clamp the neurotransmitter
triples in the auditory cortex and associative cortex using the vector V 0,1,2 and we
force the output of the associative cortex to be the same as the incoming auditory
510

Auditory

EaA
V isual Assoc
EvA

EIa
EM a
EIv
EM v EAT

Input

EIT
T halamus M idbrain
ET M
Fig. 21.1 A simple cognitive processing map focusing on a few salient modules. The edge connections are of the form Eα,β where α and β can be a
selection from v (Visual), a (Auditory), A (Associative), T (Thalamus), M (Midbrain) and I (Input)
21 Models of Cognitive Dysfunction
21.3 A Normal Brain Model 511

Auditory Assoc
EaA

N eutral
EM a
EAT
Input EM A

N eutral M idbrain

T halamus ET M

Fig. 21.2 Neural processing for neutral auditory data

Fig. 21.3 Neural processing


for sad auditory data Auditory Assoc
EaA

Sad
EM a
EAT
Input EM A

Sad M idbrain

T halamus ET M

data. The midbrain module makes the neurotransmitter connections to each cortex
module, but how these connections are used is determined by the triples and the
usual nodal processing. In a sense, we are identifying the neutral auditory emotional
state with the vector V 0,1,2 . We assume the Thalamus module can shape processing
in the midbrain. Hence, we will let V 0,1,2 be the desired output for the Thalamus
module for neutral auditory data. The Midbrain module accepts the input V 0,1,2 and
uses it to set the triple states in auditory and associative cortex. In effect, this is an
abstraction of second messenger systems that affect the neural processing in these
cortical modules. We can do this for each emotional state. For example, Fig. 21.3
shows the requisite processing for the cortical submodules.
Processing the sad auditory data clamps the triples in auditory and associative
cortex using V 1,1,2 and clamps the associative cortex output to the incoming sad
auditory data. We then train the graph to have a Thalamus output of V 1,1,2 . We do
512 21 Models of Cognitive Dysfunction

M usicInput Auditory
M usic

EM a
M idbrain EaA

M usic EM A Assoc
EAT
ET M

T halamus
EM A
EvA

P ainting

V isualInput V isual
P ainting

Fig. 21.4 Neural processing for new data

the same thing for the other emotional states for auditory and for all the emotional
states for the visual cortex data.
Now consider the graph model accepting new music and painting data. We see
the requisite graph in Fig. 21.4.
The music data and painting data will generate an output vector W from the Thala-
mus module in addition to generating associative cortex outputs corresponding to a
music and painting fragment on the basis of the encoded edge and nodal processing
that has been set by the training. Hence, we can classify the new combined audi-
tory and visual data as corresponding to an emotional labeling of the form shown in
Eq. 21.3.


4 
4
W = < W , V i,1,2 > V i,1,2 + < W , V i,0,2 > V i,0,2 (21.3)
i=1 i=1

This is the projection of W to the subspace of 18 spanned by our neurotransmitter


triples. We can then perform standard cluster analysis to determine which emotionally
labeling is closest. The training process we have gone through generates what we
will call the normal brain. This is, of course, an abstraction of many things, but we
think it has enough value that it can give us insight into cognitive dysfunction.
21.3 A Normal Brain Model 513

21.3.1 The Cognitive Dysfunction Model

Let’s summarize what we have discussed. For a given brain model G(V, E) which
consists of auditory sensory cortex, SCI, visual cortex, SCII, Thalamus, Th, Asso-
ciative Cortex, AC and Midbrain, MB, we can now build its normal emotional state.
γ
We assign neurotransmitter triples Tα,β for the four emotional states of neutral, sad,
happy and angry for each neurotransmitter. These eighteen values for eight possi-
ble states are chosen as orthonormal vectors in 18 but our choice of values for
these triples can also be guided by the biopsychological literature but we will not
discuss that here. The auditory and visual input data can be mapped into exclusive
regions of skin galvanic response and a fMRI parameter so part of our modeling is
to create a suitable abstraction of this mapping from the inputs into the two dimen-
sional skin galvanic response and fMRI space which is 2 . There are papers that
discuss how to model the fMRI response we can use for this; see Friston (2005) and
Stephan et al. (2004).
To construct the normal model, we do the following:
• For neutral music data, we train the graph to imprint the emotional states as follows:
for neutral music inputs to SCI, the clamped AC output is the same neutral music
γ γ
input and the clamped Th to MB output are the neutral triples T0,1 and T0,2 encoded
as V 0,1,2 . When training is finished, the neutral music samples are recognized as
neutral music with the neutral neurotransmitter triples engaged.
• For neutral painting data, use the combined evaluation/Hebbian update loop to
imprint the emotional states by clamping AC output to the neutral paint input to
γ γ
SCII and clamping the Th to MB output to the neutral triples T0,1 and T0,2 encoded
as V 0,0,2 . When training is finished, the neutral painting samples are recognized
as neutral paintings with the neutral neurotransmitter triples engaged.
• Do the same training loop for the sad, happy and angry music and painting data
samples using the sad, happy and angry neurotransmitter triples. The AC output
is clamped to the appropriate data input values and the Th to MB output is the
required neurotransmitter triples.
At this point, the model G(V, E) assigns known emotionally labeled data in two
sensory modalities to their correct outputs in associative cortex. A new music and
painting input would then be processed by the model and the Th to MB connections
would assign the inputs to a set of neurotransmitter triple states W and generate
a set of AC outputs for both the music and painting inputs. These outputs would
be interpreted as a new musical sequence and a new painting. We can think of this
trained model G(V, E) as defined the normal state. Also, as shown in Eq. 21.3, we
can assign an emotional labeling to the new data. We can model dysfunction at this
point by allowing changes in midbrain processing.
• Alter T D , T S and T N in the auditory, visual and associative cortex to create a new
vector W and therefore a new model G new (N, E). The nodes and edges do not
change but the way computations are processed does change. The change in model
514 21 Models of Cognitive Dysfunction

can be labeled δG. Each triple change (δT D , δT S , δT N ) gives a potential mapping
to cognitive dysfunction.
• Using the normal model G(N, E), map all neutral music and painting data to sad
data. Hence, we clamp AC outputs to sad states but do not clamp the Th to MB
mapping. This will generate new neurotransmitter triple states in MB which we can
label W sad . The exhibited changes (δT D , δT S , δT N ) give us a quantitative model
of a cognitive dysfunction having some of the characteristics of depression. Note
all 18 parameters from all 3 neurotransmitters are potentially involved. Note also
standard drugs such as Abilify only make adjustments to 2 or 3 of these parameters
which suggest our current pharmacological treatments are too narrow in scope. If
we know the normal state, this model would suggest to restore normalcy, we apply
drugs which alter neurotransmitter activity by (−δT D , −δT S , −δT N ).
• Lesions in a brain can be simulated by the removal of nodes and edges from
the graph model G(N, E). This allows us to study when the normal emotional
responses alter and lead to dysfunction.
• If we construct a model GL (NL , EL ) for the left half of the brain, a model
GR (NR , ER ) and a model of the corpus callosum, GC (NC , EC ), we can combine
these modules using the ideas from Sect. 21.2.2 to create a brain model G(N, E),
which can be used to model problems with right and left half miscommunication
problems such as schizophrenia which could be very useful. The interconnections
between the right and left half of the brain through the corpus callosum modules
and neurotransmitter connections can be altered and the responses of the altered
model to a normal brain model can be compared for possible insight.
We think the most important problem in understanding cognition and the working
of an full brain model is the issue of how features are extracted from environmen-
tal input. There are various brain structures that do this and it appears that cortical
structures evolved to handle this task quite some time ago. In Wang et al. (2010),
careful studies of the chicken brain have found a region in the telencephalon that is
similar to the mammalian auditory cortex and it is also comprised of laminated layers
of cells linked by narrow radial columns. Evidence is therefore mounting that the
complex circuitry needed to parse environmental input and do feature extraction and
associative learning probably evolved in an ancestor common to both mammals and
birds at least 300 million years ago. Further, in Sanes and Zipursky (2010), there is a
fascinating discussion about the similarities and differences between the mammalian
visual cortex and the fly visual cortex. For example, both visual systems use a small
number of neuron types divided into many subtypes for specialized computation.
They use multiple synaptic contacts to a single presynaptic terminal which connects
to multiple postsynaptic sites. Both use multiple cellular layers with regular arrange-
ments of neurons in the layers with an ordered way of mapping computation between
layers. Both use lateral interactions through parallel relays to heavily process the raw
environmental input received by the photoreceptors in the eyes. They use repeated
local modules to handle the global environmental input field and different visual
functions are helped by pathways that act in parallel. The main structure for the
parallel processing is the organization of the many synaptic connections into parallel
21.3 A Normal Brain Model 515

laminar regions. The results of this processing is then passed to other portions of the
cortex where further feature extraction and associative learning is completed. It may
be that the organization of the fly visual system and the mammalian visual system do
not share a common evolutionary origin though in contrast to what is now suspected
about the common origins of these systems in mammals and birds. Instead, this may
appears to be an example of convergent evolution at work: the problems of handing
the environmental input shaped the evolution of this type of circuity. However, we
also know that the vertebrate transcription factor Pax6 and its fly orthologue eyeless
is crucial for eye development so it is still possible that there is a shared evolutionary
origin.
There is much work to be done to develop our cognitive dysfunction model and
this will be main topic in the next volume. We will use these tools to build both
models of human cognitive dysfunction and also models of small brained animals.

References

A. Barron, E. Søvik, J. Cornish, The roles of dopamine and related compounds in reward-seeking
behavior across animal phyla. Front. Behav. Neurosci. 4(163), 1–9 (2010) (Article 163)
K. Friston, Models of brain function in neuroimaging. Annu. Rev. Psychol. 56, 57–87 (2005)
(Annual Reviews)
E. Harmon-Jones, P. Gable, C. Peterson, The role of asymmetric frontal cortical activity in emotion-
related phenomena: a review and update. Biol. Psychol. 84, 451–462 (2010)
P. Lang, M. Bradley, J. Fitzimmons, B. Cuthbert, J. Scott, B. Moulder, V. Nangia, Emotional arousal
and activation of the visual cortex: an fMRI analysis. Psychophysiology 35, 199–210 (1998)
I. Mauss, M. Robinson, Measures of emotion: a review. Cognit. Emot. 23, 209–237 (2009)
J. Peterson, Nodal computation approximations in asynchronous cognitive models. Comput. Cognit.
Sci. (2015) (In Press)
S. Russo, E. Nestler, The brain reward circuitry in mood disorders. Nat. Rev.: Neurosci. 14, 609–625
(2013)
J. Sanes, S. Zipursky, Design principles of insect and vertebrate visual systems. Neuron 66(1),
15–36 (2010)
K. Stephan, L. Harrison, W. Penny, K. Friston, Biophysical models of fMRI responses. Curr. Opin.
Neurobiol. 14, 629–635 (2004)
Y. Wang, A. Brzozowsha-Prechtl, H. Karten, Laminary and columnar auditory cortex in avian brain.
Proc. Natl. Acad. Sci. 107(28), 12676–12681 (2010)
Part VIII
Conclusions
Chapter 22
Conclusions

This text has introduced the salient issues that arise in cognitive modeling. We have
used relatively simple models of neural computation to construct graph models of
brain circuitry at essentially any level of detail we wish. In Chap. 21, we have outlined
a simple model of a normal brain and shown how, in principle, we could develop
a model of cognitive dysfunction from that. To finish out this book, let’s look at
some really hard problems that we have been building various tools to address. Let’s
consider a model of schizophrenia. Using the notations of Chap. 21, a first look could
be this. If we were to model schizophrenia with a left and right brain and a connecting
corpus callosum, the two modules would be the left and right brain subgraph and
the corpus callosum subgraph. Let’s look at this two graph system where the first
graph contains the left and the right brain and the second graph contains the corpus
callosum. Assume we have trained on T samples using the module training equations
   
 I + F IT f It + I It + g It − α f It−1 = 0
   
 I I + F ITI f It I + I It I + g It I − α f It−1
I =0

At convergence, we obtain stable values for f It and f It I we will label as f I∞ and f I∞I .
Then we must have
   
 I + F IT f I∞ + I It + g ∞ ∞
I − α fI =0
  ∞  t ∞ ∞

I I + FI I fI I + II I + gI I − α fI I = 0
T

A change in the connectivity from the Corpus Callosum to the left and right brains
and vice-versa implies a change C → C  and D → D . This changes terms which
are built from the matrices C and D, such as F IT = D D T E I I which changes to
D ( D )T E I I . After the connectivity changes are made, the graph can evaluate all
T inputs to generate a collection of new outputs SI = (( f I1 ) , . . . , f IT ) ) and SI I =
(( f I1I ) , . . . , ( f ITI ) ). The original outputs due to the trained graph are SI and SI I .
Hence, any measure of the entries of these matrices gives us a way to compare the
performance of the graph model of the brain with a correct set of corpus callosum
© Springer Science+Business Media Singapore 2016 519
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_22
520 22 Conclusions

connectivities to the graph model of a brain with dysfunctional connectivity between


the corpus callosum and the two halves of the brain. A simple ratio gives us a useful
measure (here, the norm symbols indicate any choice of matrix measurement such
as a simple Frobenius norm). Choosing a set of tolerances  I and  I I , we obtain a
||S || ||S ||
tool for deciding if there is dysfunction: for example, ||SII || >  I and ||SII II || >  I I .
In general, if we let  denote our tool for measuring the brain model’s effectiveness
as a function of the measured outputs for a sample set, we would check to see if
(SI ) (S )
(SI )
>  I and (SII II ) >  I I . The choice of  is, of course, critical and it is not easy
to determine a good choice. Another issue is that we need a way to determine what the
normal output of a brain model should be so we can make a reasonable comparison.
However, despite these obstacles, we see this procedure of submodule training allows
us to identify theoretical consequences of changes in corpus callosum communication
pathways (i.e. the cross connections to the left and right brain) and perhaps give some
illumination to this difficult problem. The above model is a very high level idea for
schizophrenia in line with the general comment that some feel schizophrenia is a
price we pay for our brains becoming complicated enough to develop language
as is discussed in Crow (2000), but even if that is true, it would be much more
interesting to develop a dynamic model which responds as a schizophrenic would
do. There are many good reasons to build a model first of which is it could give
insight. More importantly, most of our data for what is happening in schizophrenics
comes from very small data sets and hence a theoretical model of schizophrenia is
needed to help shape experiments and give better understanding as to what parts
of our interacting neural modules are malfunctioning. A neural simulation could
potentially help us with that.
There has been a lot of work which correlates various structural and connectionist
problems in the neural circuitry of an individual with schizophrenia when compared
to the normal population. Most of these studies though are data poor and so we must be
careful in drawing conclusions. Synaptic plasticity issues are discussed in Goto et al.
(2010) and various neurotransmitter problems appear to be linked to schizophrenia
also as you can see the papers Hashimoto (2009) and Domino et al. (2004). The work
in Domino et al. (2004) focuses on the N-methyl-D-aspartate (NMDA) receptor.
Physical brain changes—again from a small data set—are outlined in (Andreasen
et al. 2011) and some of the neuroimaging results about chronic schizophrenia are
discussed in Wood et al. (2011). There is developmental hypothesis for schizophrenia
outlined in Piper et al. (2012) which is a good read as it helps us with some of the larger
issues surrounding our modeling choices. Finally, the review by Fakhoury (2015) is
particularly useful in building modeling insight for similar issues that arise in major
depressive disorders which often occur with schizophrenia too. In Chap. 7, we present
a careful model of free calcium and bound calcium in a cell and show how we can
think about a sudden influx of calcium current arising from a second messenger
trigger. These ideas are helpful in understanding schizophrenia as there is evidence
that abnormalities in these processes are linked to schizophrenia as seen in Eyles
et al. (2002). Since we are easily able to model different topological organizations
22 Conclusions 521

for our brain models, the results in Zhang et al. (2012) help us understand how to set
up the graph for a schizophrenia model.
There have been recent and exciting results in treating depressive states, suicidal
impulses that are often associated with schizophrenia which are based on using an
N-methyl-D-aspartate (NMDA) receptor antagonist using the drug ketamine. A com-
plete review of the use of ketamine for major depressive problems and bipolar depres-
sion is given in Lee et al. (2015). How it is used in chronic stress, attention disorders
and depressive states is discussed in Li et al. 2011, KNott et al. (2011) and Murrough
et al. (2013). Hence, the tools and background given to you in this text will help you
on the next journey to develop interesting neural models of cognitive function and
dysfunction such as we see in depression and schizophrenia.
Finally, since we know hormones play a major role in brain plasticity as evidenced
by Garcia-Segura (2009), we know that our graph models can also be tuned or altered
by external hormone based second messenger triggers. We know that in parasitic
wasps, the wasp injects a large neurotoxin cocktail into a cockroach’s brain which
then initiates large scale changes in behavior. In Chap. 9, we discuss toxins that
can alter the action potential of a neuron in various ways. The cocktail the wasp
injects is very large scale and not a small scale input to the neural model of the
cockroach at all. A general outline of parasitic manipulation of a host is given in
Weinersmith and Faulkes (2014) and more specific information about how the wasp
does this with the cockroach is discussed in Libersat and Gal (2014). Note this cocktail
effectively reorganizes the interacting modules of the cockroach brain to perform
altered behavior. Thus, there is proof of concept for such a large reorganization of
the neural modules of any brain via an appropriate input package. With the graph
modeling tools we have developed and suitable approximations, we can hope to build
a model of this process also which in the past has been very hard to both conceptualize
and to build.
In the next volume, all of these ideas will be used to build real models.

References

N. Andreasen, P. Nopoulos, V. Magnotta, R. Pierson, S. Ziebell, B. Ho, Progressive brain change


in schizophrenia: a prospective longitudinal study of first-episode schizophrenia. Biol. Psychiatr.
70, 672–679 (2011)
T. Crow, Schizophrenia as the price Homo sapiens pays for language: a resolution of the central
paradox in the origin of the species. Brain Res. Rev. 31, 118–129 (2000)
E. Domino, D. Mirzoyan, H. Tsukada, N-methyl-D-aspartate antagonists as drug models of
schizophrenia: a surprise link to tobacco smoking. Prog. Neuro-Psychopharmacol. Biol. Psy-
chiatr. 28, 801–811 (2004)
D. Eyles, J. McGrath, G. Reynolds, Neuronal calcium-binding proteins and schizophrenia.
Schizophr. Res. 57, 27–34 (2002)
M. Fakhoury, New insights into the neurobiological mechanisms of major depressive disorders.
Gen. Hosp. Psychiatr. 37, 172–177 (2015)
L. Garcia-Segura, Hormones and Brain Plasticity (Oxford University Press, Oxford, 2009)
522 22 Conclusions

Y. Goto, C. Yang, S. Otani, Functional and dysfunctional synaptic plasticity in prefrontal cortex:
roles in psychiatric disorders. Biol. Psychiatr. 67, 199–207 (2010)
K. Hashimoto, Emerging role of glutamate in the pathophysiology of major depressive disorder.
Brain Res. Rev. 61, 105–123 (2009)
V. KNott, A. Millar, J. McIntosh, D. Shah, D. Fisher, C. Blais, V. Ilivitsky, E. Horn, Separate
and combined effects of low dose ketamine and nicotine on behavioral and neural correlates of
sustained attention. Biol. Psychol. 88, 83–93 (2011)
E. Lee, M. Della-Selva, A. Liu, S. Himelhoch, Ketamine as a novel treatment for major depressive
disorder and bipolar depression: a systematic review and quantitative meta-analysis. Gener. Hosp.
Psychiatr. 37, 178–184 (2015)
N. Li, R. Liu, J. Dwyer, M. Banasr, B. Lee, H. Son, X. Li, G. Aghajanian, R. Duman, Glutamate N-
methyl-D-aspartate receptor antagonists rapidly reverse behavioral and synaptic deficits Caused
by Chronic stress exposure. Biol. Psychiatr. 69, 754–761 (2011)
F. Libersat, R. Gal, Wasp voodoo rituals, venom—cocktails, and the zombification of cockroach
hosts. Integr. Comp. Biol. 54, 129–142 (2014)
J. Murrough, D. Iosifescu, L. Chang, R. Al Jurdi, C. Green, A. Perez, S. Iqbal, S. Pillemer, A. Foulkes,
A. Shah, D. Charney, S. Mathew, Antidepressant efficacy of ketamine in treatment-resistant major
depression: a two-site randomized controlled trial. Am. J. Psychiatr. 170, 1134–1142 (2013)
M. Piper, M. Beneyto, T. Burne, D. Eyles, D. Lewis, J. McGrath, The neurodevelopmental hypothesis
of schizophrenia: convergent clues from epidemiology and neuropathology. Psychiatr. Clin. N.
Am. 35, 571–584 (2012)
K. Weinersmith, Z. Faulkes, Parasitic manipulation of host’s phenotype, or how to make a zombie—
an introduction the symposium. Integr. Comp. Biol. 54, 93–100 (2014)
S. Wood, A. Yung, P. McGorry, C. Pantelis, Neuroimaging and treatment evidence for clinical staging
in psychotic disorders: from the at-risk mental state to chronic schizophrenia. Biol. Psychiatr. 70,
619–625 (2011)
Y. Zhang, L. Lin, C. Lin, Y. Zhou, K. Chou, C. Lo, T. Su, T. Jiang, Abnormal topological organization
of structural brain networks in schizophrenia. Schizophr. Res. 141, 109–118 (2012)
Part IX
Background Reading
Chapter 23
Background Reading

We have been inspired by many attempts by people in disparate fields to find meaning
and order in the vast compilations of knowledge that they must assimilate. Like
them, we have done a fair bit of reading and study to prepare ourselves for necessary
abstractions we need to make in our journey. We have learned a lot from various
studies of theoretical biology and computation and so forth. You will need to make
this journey too, so to help you, here are some specific comments about the sources
we have used to learn from. In addition to the books we mentioned in Peterson
(2015a, b, c), we have some further recommendations.

23.1 The Central Nervous System

• The functional anatomy of the human brain in Nolte (2002).


• The central nervous system from Brodal (1992).
• Two very useful books which has many diagrams of brain circuitry which you can
use to draw your own sketches and color in yourself are Diamond et al. (1985)
and Pinel and Edwards (2008). Thinking about these circuits at the same time
you draw and color them is a great way to begin to understand them. We have
used these two texts so much over the last ten years that they are dog-eared and
well-thumbed. These are not a stand-alone texts as they have wonderful drawings
with extremely terse textual descriptions. But they are nevertheless gold mines of
information especially if your way of understanding stuff is visual.
• An assessment of the tools used to understand the function of the brain (fMRI and so
on) is presented in Donaldson (2004). We must always remember that evidence for
anything is always obtained indirectly. There are usually many technical processing
steps performed on the raw data collected with many concomitant assumptions
before one sees information that is useful. This chain of evidentiary processing
must never be forgotten when we build our models.
© Springer Science+Business Media Singapore 2016 525
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_23
526 23 Background Reading

23.2 Information Theory, Biological Complexity


and Neural Circuits

• Biochemical Messengers in Hardie (1991). This is the first book that we began to
read about information processing in the nervous system on a technical level. It has
useful insights. All in all, we use this one as a technical reference to sometimes dip
into. This book explains a lot of the technical detail behind the messaging systems
we encounter in our studies.
• Information in the brain in Black (1991). While dated now, this has been an impor-
tant source as it tries hard to develop a unified theory of information processing.
It is an absolutely stunning attempt to understand the how and why of informa-
tion processing in the large scale nervous system. As you will see in the body of
this report, we have quoted from this work extensively. There is so much inter-
esting speculation in here. It is also extremely technical and unapologetically
so!! We used Black’s ideas extensively in our previous attempt at developing a
good abstraction for biological detail into software design. This is laid out in
Peterson (2001).
• The Evolution of Vertebrate Design in Radinsky (1987). This is a wonderful book
that attempts to look at the evolution of the vertebrate body plan in very abstract
terms.
• Large scale computational theories of the brain in Koch and Davis (1994). This is
another useful collection of attempts to model biological complexity and see the
big picture.
• Neural organization in Arbib et al. (1998). Here Arbib and his coauthors spend a lot
of time discussing their approach to seeing the hidden structure in brain circuitry.
There are a lot of good ideas in here.
• The paper (Baslow 2011) discusses the modular organization of the vertebrate
brain. Pay attention to how these ideas can influence how we build our models
using the graph tools in MatLab.
• Two layman’s books on how our notions of morality may have evolved and a
general introduction to ideas in evolutionary psychology are Buller (2006) and
Hauser (2006). These are less technical but provide needed overview.

23.3 Nervous System Evolution and Cognition

We believe there is a lot to learn from studying how organisms have evolved nervous
systems. The primitive systems contain a minimal set of structures to allow control,
movement and environmental response. We can see in the evolutionary record how
as additional structure is added, we not only see a gain in capability but also more
constraints on the organism in terms of energy requirements and so forth. Some of
the vary useful papers in this area are listed below:
23.3 Nervous System Evolution and Cognition 527

• Catania (2000) analyzes the organization of cortex in the insects. The connection
between the development of sensory modules to understand the environment and
the fusion required to build higher level precepts for enhanced survival can be
studied in many animals. However, there are advantages to studying these things
in animals with smaller cognitive systems.
• In Deacon (1990), there is a very interesting review of the state of the art in thinking
about brain evolution as of 1990.
• In Miklos (1993), there is even more of an attempt to use the ideas from the
evolution of nervous systems to help guide the development of artificial nervous
systems for use in agents, artificial life and so forth. Miklos has strong opinions
and there is much to think about in this paper.
• In Allman (1999), we can read the most up to date discussion of brain and nervous
system evolution. We have been inspired by this book and there are good pointers
to additional literature in its reference list.
• Potts (2004) discusses how paleoenvironments may have shaped the evolution of
cognition in the great apes while the great ape cognitive systems themselves are
outlined in Russon (2004). Further articles can be found in the collection (Russon
and Begun 2004).
• Johnson-Frey (2004) studies the neural circuitry that may subserve complex tool
use in humans.
• The possible connection between modularity of function and the rise of cognition
is explored in Coltheart (1999) and in evolutionary terms in Redies and Puelles
(2001).
• The appearance of artistic expression in the cave paintings of France ca 25,000
BC (and other clues) has suggested to some that neural reorganization occurred
around that time. Other scientists are not so sure. Nevertheless, it is a clue to the
relationship between brain structure and higher level processing. Brain endocasts
give us clues about the location of sulci on the surface of the brain which allows
us to compare the sulci of fossils to those of primates and modern man and look
for differences. An interesting article about paleolithic art and its connection to
modern humans is presented in Conkey (1999). Holloway (1999) analyzes such
sulci location differences in his review of human brain evolution.

23.4 Comparative Cognition

Trying to learn about how cognition might have arisen is made even more difficult
when the nervous system studied is as complex as the human brain. There is growing
evidence that simpler nervous systems appear to be capable of tasks that would easily
be considered as cognitive if seen in a higher vertebrate or primate. These simpler
systems occur in several insects and spiders. There appears to be some evidence
that parsing and fusing disparate sensory signals into higher level precepts begins
to build a cognitive system that at some point crosses a threshold to become aware.
This advanced behavior appears to develop primarily in predators that must assess
528 23 Background Reading

contradictory and ambiguous information in order to plan an attack. Once these


animals include enough varied prey for programmed attacks to be of limited utility,
they appear to undergo neural system evolution that allows pre planning of routes
that go out of sight and an almost infinite variety of attack strategies. Since these
animals have very small nervous systems—say 400,000 neurons—this is forcing
many scientists to reconsider almost everything previously said about how cognition
evolves. It also means that the software modeling of cognition processes is not as
far fetched as once believed as these nervous systems are within our computational
grasp.
These general ideas are discussed in the survey (Greenspan and van Swinderen
2004). An introduction to the study of these simpler nervous systems begins with
Prete (2004). This book has chapters devoted to several interesting small animals.
The spider portia Fimbriata is studied in Harland and Jackson (2004). Portia is quite
amazing as she appears to be self-aware and reacts to its own image in a mirror. The
motion perception of amphibians is studied in Ewert (2004). The Praying Mantis
also possess impressive cognitive capabilities as detailed in Kral and Prete (2004).
Also, the emergence of cognitive function within the small brain of the honeybee
is explored in Zhang and Srinivasan (2004). General principles of the evolution
of animal communication are studied in Hauser (1996). The evolution of cortical
diversity during development in Kornack (2000) and Krubitzer and Huffman (2000).
The neural language for words is explored in Pulvermüller (1999).
In addition to these, there are texts that deal explicitly with to problem of com-
paring the cognitive capacities and potentials of other animals.
• In Butler and Hodos (2005), there is an excellent treatment of how to do
these sorts of comparisons. The texts by Wasserman and Zentall (2006) and
Shettleworth (2010) are complimentary and are very helpful as if we are inter-
ested in understanding how small a neural model can be and still exhibit cognitive
behavior, there are clues as to how this can be implemented in the study of alterna-
tive neural strategies for solving the problems of integrating environmental signals
into useful choices for survival. Also, the papers Edelman et al. (2005) and seth
(2005) try to identify how to find evidence of consciousness in non-mammals
which again helps widen our perceptions. As all of us who have had pets know,
subjective experience in not limited to us as humans; an attempt to find evidence
for that in neurobiology is found in Baars (2005). This is another example of high
level ideas which are quite difficult to quantify into a model. There is also the
question of whether non-humans organize information like we do; i.e. do they
have cognitive maps? The paper Bennet (1996), while somewhat old, addresses
this question nicely. The general problem of trying to understand another beings
neural circuitry is the focus of Saxe et al. (2004).
• There are also sources of information for specific cognitive systems in various
animals which really help us see cognition as a general solution to the common
problems all animals face. Specifically, Reznikova (2007) is a complete discus-
sion of animal intelligence in general and Panksepp (2005) covers core emotional
23.4 Comparative Cognition 529

feelings in animals. Also, the whole idea of learning in animals is laid out in Zentall
et al. (2008). For specific animals, we can turn to the following references:
– A complete treatment of arthropod brains is given in Strausfeld (2012). Navi-
gational issues are explored in Webb et al. (2004).
– the octopus brain is analyzed in Young (1971) which is out of print but a wonder-
ful reference from which we can pull out neural circuitry for cephalopod models.
The squid brain’s development is outlined in Shigeno et al. (2001) and follow-
ing this developmental chain helps us contrast it with other neural development
pathways.
– Herrold et al. (2011) examines the nidopallium caudolaterale in birds which
might be an analogue of a type of cortex in humans. Hence, associations built
from environmental signals can be done in multiple ways which gives insight into
general cognitive architectures. A review of tool use in birds given in Lefebvre
et al. (2002) is also very helpful. In Petkov and Jarvis (2012), a very helpful
comparison on bird and primate language origins is presented which is also a way
to study how different systems are used to solve similar problems. This paper
should be read in conjunction with another survey on human language evolution
given in Berwick et al. (2012). Finally, the evolution of avian intelligence is
studied in Emery (2006).
– Honeybee neural architectures are studied in Menzel (2012) as a model of cogni-
tion. The information processing the honeybee mushroom body does is probed
in the papers (Haehnel and Menzel 2010b), (Szyszka et al. 2008) and (Sjöholm
et al. 2005). By studying these papers, we can see different ways to implement
cortical structure. A more general treatment of insect mushroom bodies is given
in Farris and Sinakevitch (2003). Honeybee memory formation is discussed in
the two papers (Hadar and Menzel 2010a) and (Roussel et al. 2010).
– The generation of behavior in a tadpole is worked out in detail in Roberts et al.
(2010) which helps us understand how low level neuronal information can give
rise to high level behavior.
– The fruit fly is capable of very complex cognitive functions which we are just
beginning to understand better. An overview of what the fruit fly is capable of is
found in Greenspan and van Swinderen (2004) and it makes for good reading.
Again, think about generalizations! A nice survey of social learning in insects
in general and what that might mean about their neural organizations is given
in Giurfa (2012) which is also very interesting.
– The ctenophore Mnemiopsis leidyi has recently had its genome sequenced and
we now know its neural architecture was developed in a different ways from
other animals. Hence, it also gives us new design principles to build functional
small brain architectures. Read Ryan et al. (2013) and Moroz et al. (2014)
carefully and take notes.
– Commonalities in the development of the nerve chord in crustaceans is discussed
in Harzsch (2003). Comparisons to what happens in arthropods are made which
allows generalization.
530 23 Background Reading

Some general ideas about brain evolution can also be illuminated by looking at
other animals. We note that these issues are explained carefully in Williams and
Holland (1998), Marakami et al. (2005) and Sprecher and Reichert (2003).

23.5 Neural Signaling

• A basic treatment of calcium signaling in the cell from Berridge (1997).


• A theoretical discussion of signaling complexes in Bray (1998) which influenced
much of the work in Chap. 6.
• Incoming environmental signals must be processed and transduced into other
forms. A good treatment of the many transduction cascades employed by a bio-
logical system to make sense of raw data is presented in Gomperts et al. (2002a).
Of particular interest to the work presented in Chap. 6 is the chapter on GTP bind-
ing proteins and their role in second messenger triggers (Gomperts et al. 2002b,
Chap. 4).

23.6 Gene Regulatory Circuits

A clear theoretical approach to gene regulatory circuits is very helpful in under-


standing how the complications of cell signaling both evolved and functions. In
particular, gene doubling is a powerful control metaphor for neural system devel-
opment. The primary sources we have used in this area are given below. The study
of the cis-regulatory module as a multi input/output control device which simulta-
neously alters structure (hardwire) and signaling (software) helps in the design of
useful asynchronous software architectures for cognitive modeling.

• The development and regulation of gene regulatory systems in Davidson (2001a).


Of particular interest is the chapter on the cis-regulatory module given in Davidson
(2001b), Chap. 2.
• The evolution of developmental pathways is discussed in clear terms in Wilkins
(2002a). Particularly interesting is the treatment of genetic pathways and networks
in developments (Wilkins 2002b, Chap. 4).
• The path from embryonic modules to function in the adult nervous system is
explored in Redies and Puelles (2004).
• A collection of articles on the theme of modularity in development is presented in
Schlosser and Wagner (2004).
23.7 Software 531

23.7 Software

Out of the many books that are available for self-study in all of the areas above,
some have proved to be invaluable, while others have been much less helpful. The
following annotated list consists of the real gems. To learn to program effectively in
an object oriented way in Python, it is helpful to know how to program in a procedural
language such as C. Then, learning how to program objects within the constraints of
the class syntax of C++ is very useful. This is the route we took in learning how to
program in an object oriented way. The final step is to learn how to use a scripting
glue language such as Python to build application software. Finally, don’t be put off
by the publication date of these resources! Many resources are timeless.

C++: The following books need to be on your shelf. Lippman will get you started,
but you’ll also need Deitel and Deitel and Olshevsky and Ponomarev for nuance.
1. C++ Primer, Lippman (1991). This book is the most basic resource for this
area. While very complete, it has shortcomings; for example, it’s discussion
of call by reference is very unclear and its treatment of dynamic binding in its
chapters on OOD is also murky. Nevertheless, it is a good basic introduction.
It’s biggest problem for us is that all of its examples are so simple (yes, even
the zoo class is just too simple to give us much insight).
2. C++: How to Program, Deitel and Deitel (1994). We have found this book
to be of great value. It intermingles excellent C++ coverage with ongoing
object oriented design (OOD)material. It is full of practical advice on software
engineering aspects of OOD design.
3. The Revolutionary Guide to OOP Using C++, Olshevsky and Ponomarev
(1994). This book has a wonderful discussion of call by reference and equally
good material on dynamic binding.
4. Compiler Design, Wilhem and Maurer (1995). This book has already been
mentioned in the text as the source of technical information on how an object-
oriented compiler is built. This is an essential resource.
Python: There are two wonderful resources for using Python in computation which
are Langtangen (2010, 2012). You can start learning about this now as the tran-
sition from using MatLab to using Python is fairly easy. The comments below on
object oriented programming are also relevant here, so even though many of the
object oriented texts are focusing on C++, don’t let that put you off.
Erlang: Erlang is a great choice for our eventual neural simulations as it will allow
us to do lesion studies. Of course, nodal computation in Erlang requires we find
good approximations to realistic neural computation, but that can be done as we
have discussed in this book. Two good references to this language are Armstrong
(2013) and Logan et al. (2011). However, the best way to get started is to read
(Hébert 2013). We encourage you to start thinking about this way of coding as
well as we will start using it in later volumes.
Haskell: Haskell is another language like Erlang which has great potential for writ-
ing lesion simulations. You should look at Lipovac̆a (2011) to get started here.
532 23 Background Reading

Object Oriented Programming and Design


1. Object-Oriented Analysis and Design with Applications (Booch 1994).
This is a classic reference to one method of handling large scale OOD. As the
number of objects in your design grows there is a combinatorial explosion in
the number of interaction pathways. The Booch method gives a popular software
engineering tool. This is best to read on a surface level, for impressions and ideas.
2. Designing Object-Oriented C++ Applications Using the Booch Method
(Martin 1995).
If you decide to use the Booch method, this book is full of practical advice. It has
many code examples, but it has a very heavy reliance on templates. This is a C++
language feature we have been avoiding because it complicates the architectural
details. Hence, the translation of Martin’s code fragments into useful insight is
sometimes difficult, but nonetheless, there is much meat here.
3. Design Patterns for Object-Oriented Software Development (Pree 1995).
The design of classes and objects is very much an art form. To some extent, like
all crafts, you learn by doing. As you get more skilled, you realize how little of the
real knowledge of how to write good classes is written down! This book is full of
hard won real-world wisdom that comes out of actually being in the programming
trenches. It is best to surface read and sample.
4. Taming C++: Pattern Classes and Persistence for Large Projects (Soukup 1994).
We have similar comments for this book. Since our proposed neural objects OOD
project will be a rather massive undertaking, useful insight into the large scale
OOD is most welcome!
5. Design Patterns: Elements of Reusable Object-Oriented Software (Gamma et al.
1995).
As you program, you realize that many classes are essential building blocks of
many disparate applications. This wonderful book brings together a large num-
ber of already worked out OOD solutions to common problems. It is extremely
important to look at this book carefully. All of the different classes are presented
in code sketches (not easy to follow, but well worth the effort!).
Neural Simulation Software: To model these things, we can use
1. the Genesis modeling language as discussed in The Book of Genesis: Exploring
Realistic Neural Models with the GEneral NEural SImulation System, by
Bower and Beeman (Bower and Beeman 1998).
2. home grown code written in C or C++.
3. home grown code written in MatLab.
4. home grown code written in Python.
5. home grown code written in Erlang.
23.8 Theoretical Robotics 533

23.8 Theoretical Robotics

• In the Ph.D. thesis of Vogt (2000), we see studies and experiments on how to
get collections of robots to come up with a language all on their own and to
then to communicate using this language. This is extremely fascinating and the
mechanisms that Vogt and his colleagues use to implement both software and
hardware so as to gain behavioral plasticity have great promise.
• The view of situated cognition in Clancey (1997). In this book, Clancey discusses
at great length what he considers to be appropriate ways to develop robots that will
have the plasticity we need to perform interesting things. There is a lot of food for
thought here.

References

J. Allman, Evolving Brains (Scientific American Library, New York, 1999)


M. Arbib, P. Erdi, P. Szentagothal, Neural Organization: Structure, Function and Dynamics (A
Bradford Book, MIT Press, Cambridge, 1998)
J. Armstrong, Programming Erlang Second Edition: Software for a Concurrent World (The Prag-
matic Bookshelf, Dallas, 2013)
B. Baars, Subjective experience is probably not limited to humans: the evidence from neurobiology
and behavior. Conscious. Cogn. 14, 7–21 (2005)
M. Baslow, The vertebrate brain, evidence of its modular organization and operating system: insights
into the brain’s basic units of structure, function, and operation and how they influence neuronal
signaling and behavior. Front. Behav. Neurosci. 5, 1–7 (2011). (Article 5)
A. Bennet, Do animals have cognitive maps? J. Exp. Biol. 199, 219–224 (1996)
M. Berridge, Elementary and global aspects of calcium signalling. J. Physiol. 499, 291–306 (1997)
R. Berwick, G. Beckers, K. Okanoya, J. Bolhuis, A bird’s eye view of human language evolution.
Front. Evol. Neurosci. 4, 1–25 (2012). (Article 5)
I. Black, Information in the Brain: A Molecular Perspective (A Bradford Book, MIT Press, Cam-
bridge, 1991)
G. Booch, Object-Oriented Analysis and Design with Applications, 2nd edn. (Benjamin/Cummings
Publishing Company, Redwood City, 1994)
J. Bower, D. Beeman, The Book of Genesis: Exploring Realistic Neural Models with the GEneral
NEural SImulation System, 2nd edn. (Springer TELOS, New York, 1998)
D. Bray, Signalling complexes: biophysical constraints on intracellular communication. Annu. Rev.
Biophys. Biomol. Struct. 27, 59–75 (1998)
P. Brodal, The Central Nervous System: Structure and Function (Oxford University Press, New
York, 1992)
D. Buller, Adapting Minds: Evolutionary Psychology and the Persistent Quest for Human Nature
(A Bradford Book, The MIT Press, Cambridge, 2006)
A. Butler, W. Hodos, Comparative Vertebrate Neuroanatomy: Evolution and Adaptation, 2nd edn.
(Wiley-Interscience, New York, 2005)
K. Catania, Cortical organization in insectovoria: the parallel evolution of the sensory periphery
and the brain. Brain, Behav. Evol. 55(6), 311–321 (2000)
W. Clancey, Situated Cognition: On Human Knowledge and Computer Representations (Cambridge
University Press, Cambridge, 1997)
M. Coltheart, Modularity and cognition. Trends Cogn. Sci. 3(3), 115–120 (1999)
534 23 Background Reading

M. Conkey, A history of the interpretation of European ‘paleolithic art’: magic, mythogram, and
metaphors for modernity, in Handbook of Human Symbolic Evolution, ed. by A. Lock, C. Peters
(Blackwell Publishers, Massachusetts, 1999)
E. Davidson, Genomic Regulatory Systems: Development and Evolution (Academic, San Diego,
2001a)
E. Davidson, Inside the cis-regulatory module: control logic, and how regulatory environment is
transduced into spatial patterns of gene expression, Genomic Regulatory Systems: Development
and Evolution (Academic, San Diego, 2001b), pp. 26–63
T. Deacon, Rethinking mammalian brain evolution. Am. Zool. 30, 629–705 (1990)
H. Deitel, P. Deitel, C ++ : How to Program (Prentice Hall, Upper Saddle River, 1994)
M. Diamond, A. Scheibel, L. Elson, The Human Brain Coloring Book (Barnes and Noble Books,
New York, 1985)
D. Donaldson, Parsing brain activity with fMRI and mixed designs: what kind of a state is neu-
roimaging in? Trends Neurosci. 27, 442–444 (2004)
D. Edelman, B. Baars, A. Seth, Identifying hallmarks of consciousness in non-mammalian species.
Conscious. Cogn. 14, 169–187 (2005)
N. Emery, Cognitive ornithology: the evolution of avian intelligence. Philos. Trans. R. Soc. B 361,
23–43 (2006)
J. Ewert, Motion perception shapes the visual world of the amphibians, in Complex Worlds from
Simpler Nervous Systems, ed. by F. Prete (A Bradford Book, MIT Press, Cambridge, 2004), pp.
117–160
S. Farris, I. Sinakevitch, Development and evolution of the insect mushroom bodies: towards the
understanding of conserved developmental mechanisms in a higher brain center. Arthropod Struct.
Dev. 32, 79–101 (2003)
E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns: Elements of Reusable Object-
Oriented Software (Addison-Wesley, Reading, 1995)
M. Giurfa, Social learning in insects: a higher-order capacity? Front. Behav. Neurosci. 6, 1–3 (2012).
(Article 57)
B. Gomperts, P. Tatham, I. Kramer, Signal Transduction (Academic, San Diego, 2002a)
B. Gomperts, P. Tatham, I. Kramer, GTP-binding proteins and signal transduction, Signal Trans-
duction (Academic, San Diego, 2002b), pp. 71–106
R. Greenspan, B. van Swinderen, Cognitive consonance: complex brain functions in the fruit fly
and its relatives. Trends Neurosci. 27(12), 707–711 (2004)
R. Hadar, R. Menzel, Memory formation in reversal learning of the honeybee. Front. Behav. Neu-
rosci. 4, 1–11 (2010a). (Article 4)
M. Haehnel, R. Menzel, Sensory representation and learning-related plasticity in mushroom body
extrinsic feedback neurons of the protocerebral tract. Front. Syst. Neurosci. 4, 1–13 (2010b).
(Article 161)
D. Hardie, Biochemical Messengers: Hormones, Neurotransmitters and Growth Factors (Chapman
& Hall, London, 1991)
D. Harland, R. Jackson, Portia perceptions: the umwelt of an araneophagic jumping spider, in
Complex Worlds from Simpler Nervous Systems, ed. by F. Prete (A Bradford Book, MIT Press,
Cambridge, 2004), pp. 5–40
S. Harzsch, Ontogeny of the ventral nerve chord in malacostracan crustaceans: a common plan for
neuronal development in Crustacea, Hexapoda and other Arthropoda? Arthropod Struct. Dev. 32,
17–37 (2003)
M. Hauser, The Evolution of Communication (A Bradford Book, MIT Press, Cambridge, 1996)
M. Hauser, Moral Minds: How Nature Designed Our Universal Sense of Right and Wrong (Harper-
Collins, New York, 2006)
F. Hébert, Learn You Some Erlang for Great Good (No Starch Press, San Francisco, 2013)
C. Herrold, N. Palomero-Gallagher, B. Hellman, S. Kröner, C. Theiss, O. Güntürkün, K. Zilles,
The receptor architecture of the pigeons’ nidopallium caudolaterale: an avian analogue to the
mammalian prefrontal cortex. Brain Struct. Funct. 216, 239–254 (2011)
References 535

R. Holloway, Evolution of the human brain, in Handbook of Human Symbolic Evolution, ed. by A.
Lock, C. Peters (Blackwell Publishers, Massachusetts, 1999), pp. 74–116
S. Johnson-Frey, The neural basis of complex tool use in humans. Trends Cogn. Sci. 8(2), 71–78
(2004)
K. Kral, F. Prete, In the mind of a hunter: the visual world of the praying mantis, in Complex Worlds
from Simpler Nervous Systems, ed. by F. Prete (A Bradford Book, MIT Press, Cambridge, 2004),
pp. 75–116
K. Koch, J. Davis (eds.), Large Scale Neuronal Theories of the Brain (A Bradford Book, MIT Press,
Cambridge, 1994)
D. Kornack, Neurogenesis and the evolution of cortical diversity: mode, tempo, and partitioning
during development and persistence in adulthood. Brain, Behav. Evol. 55(6), 336–344 (2000)
L. Krubitzer, K. Huffman, A realization of the neocortex in mammals: genetic and epigenetic
contributions to the phenotype. Brain, Behav. Evol. 55(6), 322–335 (2000)
H. Langtangen, Python Scripting for Computational Science (Springer, New York, 2010)
H. Langtangen, A Primer of Scientific Programming with Python (Springer, New York, 2012)
L. Lefebvre, N. Nicolakakis, D. Boire, Tools and brains in birds. Behaviour 139, 939–973 (2002)
M. Lipovac̆a, Learn You a Haskell for Great Good (No Starch Press, San Francisco, 2011)
S. Lippman, C ++ Primer, 2nd edn. (Addison-Wesley, Reading, 1991)
M. Logan, E. Merritt, R. Carlsson, Erlang and OTP in Actions (Manning, Stamford, 2011)
Y. Marakami, K. Uchida, F. Rijli, S. Kuratani, Evolution of the brain developmental plan: insights
from agnathans. Dev. Biol. 280, 249–259 (2005)
R. Martin, Designing Object-Oriented C ++ Applications Using the Booch Method (Prentice Hall,
Englewood Cliffs, 1995)
R. Menzel, The honeybee as a model for understanding the basis of cognition. Nat. Rev.: Neurosci.
13, 758–768 (2012)
G. Miklos, Molecules and cognition: the latterday lessons of levels, language and lac: evolutionary
overview of brain structure and function in some vertebrates and invertebrates. J. Neurobiol.
24(6), 842–890 (1993)
L. Moroz, K. Kocot, M. Citrarella, S. Dosung, T. Norekian, I. Povolotskaya, A. Grigorenko, C.
Dailey, E. Berezikov, K. Buckely, A. Ptitsyn, D. Reshetov, K. Mukherjee, T. Moroz, Y. Bobkova,
F. Yu, V. Kapitonov, J. Jurka, Y. Bobkov, J. Swore, D. Girado, A. Fodor, F. Gusev, R. Sanford,
R. Bruders, E. Kittler, C. Mills, J. Rast, R. Derelle, V. Solovyev, F. Kondrashov, B. Swalla, J.
Sweedler, E. Rogaev, K. Halancych, A. Kohn, The ctenophore genome and the evolutionary
origins of neural systems. Nature 510, 109–120 (2014)
J. Nolte, The Human Brain: An Introduction to Its Functional Anatomy (Mosby, A Division of
Elsevier Science, St. Louis, 2002)
V. Olshevsky, A. Ponomarev, The Revolutionary Guide to OOP Using C ++ (WROX Publishers,
Birmingham, 1994)
J. Panksepp, Affective consciousness: core emotional feelings in animals and humans. Conscious.
Cogn. 14, 30–80 (2005)
J. Peterson, A White Paper on Neural Object Design: Preliminaries. Department of Mathemati-
cal Sciences (1995), Revised 1998, Revised 1999, Revised 2001. http://www.ces.clemson.edu/
~petersj/NeuralCodes/NeuralObjects3
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015a in press)
J. Peterson, Calculus for Cognitive Scientists: Higher Order Models and Their Analysis, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015b in press)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015c in press)
536 23 Background Reading

C. Petkov, E. Jarvis, Birds, primates, and spoken language origins: behavioral phenotypes and
neurobiological substrates. Front. Behav. Neurosci. 4, 1–24 (2012). (Article 12)
J. Pinel, M. Edwards, A Colorful Introduction to the Anatomy of the Human Brain: A Brain and
Psychology Coloring Book (Pearson, New York, 2008)
R. Potts, Paleoenvironments and the evolution of adaptability in great apes, in The Evolution of
Thought: Evolutionary Origins of Great Ape Intelligence, ed. by A. Russon, D. Begun (Cambridge
University Press, Cambridge, 2004), pp. 237–259
W. Pree, Design Patterns for Object-Oriented Software Development (ACM Press Books, Addison-
Wesley, New York, 1995)
F. Prete (ed.), Complex Worlds from Simpler Nervous Systems (A Bradford Book, MIT Press,
Cambridge, 2004)
F. Pulvermüller, Words in the brain’s language. Behav. Brain Sci. 22, 253–336 (1999)
L. Radinsky, The Evolution of Vertebrate Design (The University of Chicago Press, Chicago, 1987)
C. Redies, L. Puelles, Modularity in vertebrate brain development and evolution. Bioessays 23,
1100–1111 (2001)
C. Redies, L. Puelles, Central nervous system development: from embryonic modules to functional
modules, in Modularity in Development and Evolution, ed. by G. Schlosser, G. Wagner (University
of Chicago Press, Chicago, 2004), pp. 154–186
Z. Reznikova, Animal Intelligence: From Individual to Social Cognition (Cambridge University
Press, Cambridge, 2007)
A. Roberts, W. Li, S. Soffe, How neurons generate behavior in a hatchling amphibian tadpole: an
outline. Front. Behav. Neurosci. 4, 1–11 (2010). Article 16
E. Roussel, J. Sandoz, M. Giurfa, Searching for learning-dependent changes in the antennal lobe:
simultaneous recording of neural activity and aversive olfactory learning in honeybees. Front.
Behav. Neurosci. 4, 1–11 (2010). (Article 155)
A. Russon, Great ape cognitive systems, in The Evolution of Thought: Evolutionary Origins of
Great Ape Intelligence, ed. by A. Russon, D. Begun (Cambridge University Press, Cambridge,
2004), pp. 76–100
A. Russon, D. Begun (eds.), The Evolution of Thought: Evolutionary Origins of Great Ape Intelli-
gence (Cambridge University Press, Cambridge, 2004)
J. Ryan, K. Pang, C. Schnitzler, A. Nguyen, R. Moreland, D. Simmons, B. Koch, W. Francis, P.
Havlak, S. Smith, N. Putnam, S. Haddock, C. Dunn, T. Wolfsberg, J. Mullikin, M. Martindale, A.
Baxevanis, The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type
evolution. Science 342(6164), 1242592-1–12425920-8 (2013)
R. Saxe, S. Carey, N. Kanwisher, Understanding other minds: linking developmental psychology
and functional neuroimaging. Annu. Rev. Psychol. 55, 87–124 (2004). (Annual Reviews)
G. Schlosser, G. Wagner (eds.), Modularity in Development and Evolution (University of Chicago
Press, Chicago, 2004)
A.K. Seth, Criteria for consciousness in humans and other mammals. Conscious. Cogn. 14, 119–139
(2005)
S. Shettleworth, Cognition, Evolution, and Behavior (Oxford University Press, Oxford, 2010)
S. Shigeno, H. Kidokoro, K. Tsuchiya, S. Segawa, Development of the brain in the oegopsid squid,
Todarodes pacificus: an atlas from hatchling to juvenile. Zool. Sci. 18, 1081–1096 (2001)
M. Sjöholm, I. Sinakevitch, R. Ignell, N. Strausfeld, B. Hansson, Organization of Kenyon cells in
subdivisions of the mushroom bodies of a lepidopteran insect. J. Comp. Neurol. 491, 290–304
(2005)
J. Soukup, Taming C ++ : Pattern Classes and Persistence for Large Projects (Addison-Wesley,
Reading, 1994)
S. Sprecher, H. Reichert, The urbilaterian brain: developmental insights into the evolutionary origin
of the brain in insects and vertebrates. Arthropod Struct. Dev. 32, 141–156 (2003)
N. Strausfeld, Arthropod Brains: Evolution, Functional Elegans, and Historical Significance (The
Belknap Press of Harvard University Press, Massachusetts, 2012)
References 537

P. Szyszka, A. Galkin, R. Menzel, Associative and non-associative plasticity in Kenyon cells of the
honeybee mushroom body. Front. Syst. Neurosci. 2, 1–10 (2008). (Article 3)
P. Vogt, Lexicon Grounding on Mobile Robots. Ph.D. thesis, Vrije Universiteit, Brussel, Laborato-
rium voor Artificiele Intelligie (2000)
E. Wasserman, T. Zentall (eds.), Comparative Cognition: Experimental Explorations of Animal
Intelligence (Oxford University Press, Oxford, 2006)
B. Webb, R. Harrison, M. Willis, Sensorimotor control of navigation in arthropod and artificial
systems. Arthropod Struct. Dev. 33, 301–329 (2004)
R. Wilhem, D. Maurer, Compiler Design (Addison-Wesley, Reading, 1995)
A. Wilkins, The Evolution of Developmental Pathways (Sinauer Associates, Sunderland, 2002a)
A. Wilkins, Genetic pathways and networks in development, The Evolution of Developmental
Pathways (Sinauer Associates, Sunderland, 2002b), pp. 99–126
N. Williams, P. Holland, Molecular evolution of the brain of chordates. Brain, Behav. Evol. 52,
177–185 (1998)
J. Young, The Anatomy of the Nervous System of Octopus Vulgaris (Oxford at the Clarendon Press,
Oxford, 1971)
T. Zentall, E. Wasserman, O. Lazareva, R. Thompson, M. Rattermann, Concept learning in animals.
Comp. Cogn. Behav. Rev. 3, 13–45 (2008)
S. Zhang, M. Srinivasan, Exploration of cognitive capacity in honeybees: higher functions emerge
from small brain, in Complex Worlds from Simpler Nervous Systems, ed. by F. Prete (A Bradford
Book, MIT Press, Cambridge, 2004), pp. 41–74
Glossary

Address based graph models The basic idea here is to take the usual graphs,
edges and vertices classes and add additional information to them. Each node
now has a global node number and a six dimensional address. The usual methods
for manipulating these classes are then available. Using these tools, it is easier to
build complicated graphs one subgraph at a time. For example, it is straightforward
to build cortical cans out of OCOS, FFP and Two/Three subcircuits. The OCOS
will have addresses [0; 0; 0; 0; 0; 1−7] as there are 7 neurons. The FFP is simpler
with addresses [0; 0; 0; 0; 0; 1 − 2] as there are just 2 neurons, The six neuron
Two/Three circuit will have 6 addresses, [0; 0; 0; 0; 0; 1 − 6]. To distinguish
these neurons in different circuits from one another, we add a unique integer in
the fifth component. The addresses are now OCOS ([0; 0; 0; 0; 1; 1 − 7]), OCOS
([0; 0; 0; 0; 2; 1−2]) and Two/Three ([0; 0; 0; 0; 3; 1−6]), So a single can would
look like what is shown below

can L1 . . . . y . . FFP, y’s Address [0; 0; 0; 0; 2; 1 − 2]


L2 z . . z . . z Two/Three, z’s Address [0; 0; 0; 0; 3; 1 − 6]
L3 . . z . . . .
L4 z x . x . x z OCOS x’s Address [0; 0; 0; 0; 1; 1 − 7]
. x . x . x .
L5 . . . . y . .
L6 . . . x . . .

We are not showing the edges here, for convenience. We can then assemble cans
into columns by stacking them vertically. There are many details and you should
read Chap. 19 carefully. A typical small brain model could then consist of modules
with the following addresses, p. 417:

© Springer Science+Business Media Singapore 2016 539


J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7
540 Glossary

Input [1; ·; ·; ·; ·; ·]
Left Brain [2; ·; ·; ·; ·; ·]
Corpus Callosum [3; ·; ·; ·; ·; ·]
Right Brain [4; ·; ·; ·; ·; ·]
Thalamus [5; ·; ·; ·; ·; ·]
MidBrain [6; ·; ·; ·; ·; ·]
Cerebellum [7; ·; ·; ·; ·; ·]
Output [8; ·; ·; ·; ·; ·]

Biological Feature Vector of BFV This is a finite dimensional approximation ζ


to an action potential given by
⎧ ⎫

⎪ (t0 , V0 ) start point ⎪


⎪ ⎪


⎪ (t1 , V1 ) maximum point ⎪

⎨ ⎬
(t2 , V2 ) return to reference voltage
ζ=

⎪ (t3 , V3 ) minimum point ⎪


⎪ ⎪


⎪ (g, t4 , V4 ) sigmoid model of tail ⎪

⎩ ⎭
V3 + (V4 − V3 ) tanh(g(t − t3 ))

where the model of the tail of the action potential is of the form Vm (t) =
V3 + (V4 − V3 ) tanh(g(t − t3 )). Note that Vm (t3 ) = (V4 − V3 ) g and so if we
were using real voltage data, we would approximate Vm (t3 ) by a standard finite
difference. In Sect. 9.3, we derive equations telling us how the various attributes
of an action potential are altered when there are changes in these 11 parameters.
This information can then be used to link second messenger effects directly to
BFV changes, p. 141.
Biological Information Processing Biological systems process information that
uses a computational node paradigm. Inputs are collected in a network of incoming
edges called the dendritic arbor, combined in nonlinear ways in the computational
unit called the node and then an output is generated. This processing can be
modeled in many ways and in the text, we focus on some of them such as matrix
feed forward networks and graphs of nodes which we sometimes call chained
networks, p. 4.
Brain model For our purposes, we will consider a brain model to consist of the
cerebrum, the cerebellum and the brain stem. Finer subdivisions are then shown in
the table below where some structures are labeled with a corresponding number
for later reference in figures shown in Chap. 5. The numbering scheme shown
below helps us locate brain structures deep in the brain that can only be seen
by taking slices. These numbers thus correspond to the brain structures shown
in Fig. 5.4 (modules that can be seen on the surface) and the brain slices of
Figs. 5.5a, b, and 5.6a, b. A useful model of the processing necessary to combine
disparate sensory information into higher level concepts is clearly built on models
of cortical processing.
Glossary 541

Brain → Cerebrum Cerebellum 1 Brain Stem


Cerebrum → Cerebral Hemisphere Diencephalon
Brain Stem → Medulla 4 Pons 3 Midbrain 2
Cerebral Hemisphere → Amygdala 6 Hippocampus 5
Cerebral Cortex Basal Ganglia
Diencephalon → Hypothalamus 8 Thalamus 7
Cerebral Cortex → Limbic 13 Temporal 12 Occipital 11
Parietal 10 Frontal 9
Basal Ganglia → Lenticular Nucleus 15 Caudate Nucleus 14
Lenticular Nucleus → Global Pallidus 16 Putamen 15

There are many more details you should look over in Chap. 5 including more infor-
mation about cortical models, p. 67.

Calcium trigger event Some triggers initiate a spike in calcium current which
enters the cell and alters the equilibrium between free calcium in the cell and
calcium stored in a variety of buffering complexes. This provides useful ways of
regulating many biological processes. A second messenger system often involves
Ca++ ion movement in and out of the cell. The amount of free Ca++ ion in
the cell is controlled by complicated mechanisms, but some is stored in buffer
complexes. The release of calcium ion from these buffers plays a big role in
cellular regulatory processes and which protein P(T1 ) is actually created from a
trigger T0 . Calcium is bound to many proteins and other molecules in the cytosol.
This binding is fast compared to calcium release and diffusion. Let’s assume
there are M different calcium binding sites; in effect, there are different calcium
species. We label these sites with the index j. Each has a binding rate constant
kj+ and a disassociation rate constant kj− . We assume each of these binding sites
is homogeneously distributed throughout the cytosol with concentration Bj . We
let the concentration of free binding sites for species j be denoted by bj (t, x) and
the concentration of occupied binding sites is cj (t, x); where t is our time variable
and x is the spatial variable. We let u(t, x) be the concentration of free calcium
ion in the cytosol at (t, x). The amount of release/uptake could be a nonlinear
function of the concentration of calcium ion. Hence, we model this effect with
the nonlinear mapping f0 (u). The diffusion dynamics are then

M
∂u ∂2u
= f0 (u) + kj− cj − kj+ (Bj − cj )u + D0
∂t j=1
∂x 2
542 Glossary

The dynamics for cj are given by

∂cj ∂ 2 cj
= −kj− cj + kj+ (Bj − cj )u + Dj
∂t ∂x 2
where D0 is diffusion coefficient for free calcium ion. There are also boundary
conditions which we will let you read about in the discussions in Chap. 7. The
concentration of total calcium is the sum of the free and the bound. We denote
this by w(t, x) and we can show

M M
∂w ∂2w ∂ 2 cj
= f0 w − cj + D0 + (Dj − D0 )
∂t j=1
∂x 2 j=1
∂x 2

Thus, the total calcium equation is

M M
∂w ∂2w ∂ 2 cj
= f0 w − cj + D0 + (Dj − D0 )
∂t j=1
∂x 2 j=1
∂x 2

Using several assumptions about how different rates of calcium binding compare
we can then reduce this to a new dynamic model for u given by
 2
∂u f0 (u) D0 + Mj=1 Dj γj ∂ u
= +
∂t   ∂x 2

where  is a new constant that comes out of the approximations we make.


The details are in Chap. 7. Then, by redefining the diffusion constant by D̂ =
M
D0 + j=1 Dj γj

, we arrive at a good approximation for u’s dynamics

∂u f0 (u) ∂2u
= + D̂ 2
∂t  ∂x
This is the model that allows us to estimate the effect of a calcium current spike
initiated by a second messenger event which is important in understanding how
to approximate nodal computations in simulations, p. 107.
Cellular trigger The full event chain for a cellular trigger (you will have to look
up the meaning of these terms in Chap. 6) is

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → T1 /T1∼ P
T1 /T1∼ P + tagging system → T1 /T1∼ PVn
T1 /T1∼ PVn + fSQVn → T1
T1 → nucleus → tagged protein transcription P(T1 )
Glossary 543

where P(T1 ) indicates the protein whose construction is initiated by the trigger
T0 . Without the trigger, we see there are a variety of ways transcription can be
stopped:

• T1 does not exist in a free state; instead, it is always bound into the complex
T1 /T1∼ and hence can’t be activated until the T1∼ is removed.
• Any of the steps required to remove T1∼ can be blocked effectively killing
transcription:
– phosphorylation of T1∼ into T1∼ P is needed so that tagging can occur. So
anything that blocks the phosphorylation step will also block transcription.
– Anything that blocks the tagging of the phosphorylated T1∼ P will thus block
transcription.
– Anything that stops the removal mechanism fSQVn will also block transcrip-
tion.

The steps above can be used therefore to further regulate the transcription of T1
into the protein P(T1 ). Let T0 , T0 and T0 be inhibitors of the steps above. These
inhibitory proteins can themselves be regulated via triggers through mechanisms
just like the ones we are discussing. In fact, P(T1 ) could itself serve as an
inhibitory trigger—i.e. as any one of the inhibitors T0 , T0 and T0 . Our theoretical
pathway is then

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → step i T1 /T1∼ P
T1 /T1∼ P + tagging system → step ii T1 /T1∼ PVn
T1 /T1∼ PVn + fSQVn → step iii T1
T1 → nucleus → tagged protein transcription
P(T1 )

where the step i, step ii and step iii can be inhibited as shown below:

T0 → Cell Surface Receptor → PK/U


PK/U + T1 /T1∼ → step i T1 /T1∼ P
↑ T0 kill

T1 /T1 P + tagging system → step ii T1 /T1∼ PVn
↑ T0 kill

T1 /T1 PVn + fSQVn → step iii T1
↑ T0 kill
T1 → nucleus → tagged protein transcription
P(T1 )

These event chains can then be analyzed dynamically and using equilibrium
analysis, approximations to the effects of a trigger can be derived. The full
details of that discussion are complex and are given carefully in Chap. 6. The
protein P(T1 ) can be an important one for the generation of action potentials
544 Glossary

such a sodium gates. Hence, this is a model of second messenger activity that
can alter the hardware of the cell, p. 85.
CFFN back propagation The CFFN energy function is minimized using gradient
descent just as we do in the MFFN case, but the derivation is different as the
structure of the problem is in a graph form rather than matrices and vectors. The
discussion is technical and we refer you to Sect. 17.2 for the details of how to find
the required partial derivatives recursively, p. 319.
CFFN training problem The output of the CFFN is therefore a vector in nO
defined by
 
H(x) = Y i | i ∈ V

and we see that H : nI → nO is a highly nonlinear function that is built out
of chains of feedforward nonlinearities. The parameters that control the value
of H(x) are the forward links, the offsets and the gains for each neuron. The
cardinality of these parameter sets are given by

N−1
ns = | F(i) |
i=0
no = N
ng = N

where |F(i)| denotes the size of the forward link set for the ith neuron; ns denotes
the number of synaptic links; no , the number of offsets; and ng , the number of
gains. Now let I = {xα ∈ RnI : 0 ≤ α ≤ S − 1} and D = {Dα ∈ RnO : 0 ≤ α ≤
S − 1} be two given sets of data of size S > 0. The set I is the set of input
exemplars and the set D is the set of outputs that are associated with exemplars.
Together, the sets I and D comprise what is known as the training set. Also,
from now on, as before in the MFFN case, the subscript notation α indicates the
dependence of various variables on the αth exemplar in the sets I and D. The
training problem is then to choose the CFFN parameters to minimize an energy
function, E, given by

S−1 nO −1
E = 0.5 f (Yαvi − Dαi ),
α=0 i=0
S−1
= 0.5 f (Yαi − Dαdi )
α=0 i∈V

where f is a nonnegative function of the error term Yαi −Dαdi ; e.g. using the function
f (x) = x 2 gives the standard L 2 or least squares energy function, p. 318.
Glossary 545

Chained feedforward network or CFFN We consider a function H : nI →


nO that has a very special nonlinear structure. This structure consists of a string
or chain of computational elements, generally referred to as neurons in deference
to a somewhat tenuous link to a lumped sum model of post-synaptic potential.
Each neuron processes a summed collection of weighted inputs via a saturating
transfer function with bounded output range. The neurons whose outputs connect
to a given target or postsynaptic neuron are called presynaptic neurons. Each
presynaptic neuron has an output Y which is modified by the synaptic weight
Tpre,post connecting the presynaptic neuron to the postsynaptic neuron. This gives
a contribution Tpre,post Y to the input of the postsynaptic neuron. The postsynaptic
neuron then processes the inputs using some sort of nodal function which is often
a sigmoid as discussed in the MFFN case. The CFFN model consists of a string
of N neurons, labeled from 0 to N − 1. Some of these neurons can accept external
input and some have their outputs compared to external targets. We let

U = {i ∈ {0, . . . , N − 1} | neuron i is an input neuron}


 
= u0 , . . . , unI −1
V = {i ∈ {0, . . . , N − 1} | neuron i is an output neuron}
= {v0 , . . . , vnO −1 }

We will let nI and nO denote the cardinality of U and V respectively. The remaining
neurons in the chain which have no external role will be called hidden neurons
with dimension nH . Note that nH + | U | = N. Note that it is possible for an input
neuron to be an output neuron; hence U and V need not be disjoint sets. The chain
is thus divided by function into three possibly overlapping types of processing
elements: nI input neurons, nO output neurons and nH internal or hidden neurons.
We will let the set of postsynaptic neurons for neuron i be denoted by F(i), the
set of forward links for neuron i. Note also that each neuron can be viewed as a
postsynaptic neruon with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by
B(i). The input of a typical postsynaptic neuron therefore requires summing over
the backward link set of the postsynaptic neuron in the following way:

ypost = x + Tpre→post Y pre


pre∈B(post)

where the term x is the external input term which is only used if the post neuron
is an input neuron. Note the CFFN notation is much cleaner than the MFFN and
is more general as we can have links in the CFFN model we just cannot handle
in the MFFN, p. 315.
546 Glossary

Evolvable Hardware Certain types of hardware can be changed due to environ-


mental input. The best example is that of a FPGA—the Field Programmable Gate
Array. This consists of a large number of cells which can be thought of a fixed
hardware primitives whose interconnections can be shuffled according to real time
demand to assemble into large scale hardware modules, p. 4.

Fourier Transform Given a function g defined on the y axis, we define the Fourier
Transform of g to be
 ∞
1
F (g) = √ g(y) e−jξy dy
2π −∞

where j denotes the square root of minus 1 and the exponential term is as usual

e−jξy = cos(ξy) − j sin(ξy)

This integral is well defined if g is what is called square integrable which roughly
means we can get a finite value for the integral of g 2 over the y axis, p. 41.

Graph model A model consisting of nodes called vertices and edges. In MatLab,
there are three classes we define: the vertices class, the edges class and the graphs
class using the particular syntax MatLab requires. This is fairly technical, so you
need to look carefully at the details in Sect. 18.1. Roughly speaking, we first create
edges and vertices objects and use them to build the graphs objects. To build an
OCOS graph, the commands
V = [1;2;3;4;5;6;7;8]; v = vertices(V);
define the OCOS nodes; the edges are defined by
E = [1;2],[2;3],[2;4],[2;5],[3;6],[4;7],[5;8],[2;7],[1;7];
and
e = edges(E); and OCOS=graphs(v,e); define the OCOS graph. There
are many other methods associated with these classes such as adding vertices and
edges, adding two graphs together and so forth. The details are in Sect. 18.1. The
big thing here is that when two graphs are merged, their individual node and edge
numbers are redone and all information about where they originally came from
Glossary 547

is lost. Retaining that sort of information requires we add address information, p.


340.

Hebbian update We can also adjust parameters in a graph model using the classic
idea of a Hebbian update which would be implemented as shown below: choose
a tolerance  and at each node in the backward set of the post neuron i, update
the edge value by the factor ζ if the product of the post neuron node value fi and
the edge value Ej→i exceeds . The update algorithm then consists of the paired
operations: sweep through the DG to do an evaluation and then use both Hebbian
updates and the graph flow equations to adjust the edge values, p. 499.
for(i = 0; i < N ; i + +) {
for (j ∈ B(i)) {
yp = fi Ej→i
if (yp > )
Ej→i = ζ Ej→i
}
}

Laplace Transform Given the function x defined on s ≥ 0, we define the Laplace


Transform of x to be
 ∞
L (x) = x(s) e−βs ds
0

The new function L (x) is defined for some domain of the new variable β. The
variable β’s domain is called the frequency domain and in general, the values of
β where the transform is defined depend on the function x we are transforming.
Also, in order for the Laplace transform of the function x to work, x must not
grow to fast—roughly, x must decay like an exponential function with a negative
coefficient, p. 39.
Laplacian graph based information flow The standard cable equation for the
voltage v across a membrane is

∂v
λ2 ∇ 2 v − τ − v = −r λ2 k
∂t
548 Glossary

where the constant λ is the space constant, τ is the time constant and r is a
geometry independent constant. The variable k is an input source. We assume the
flow of information through a graph model G is given by the graph based partial
differential equation

τ G ∂f 1
∇ G2 f − − 2 f = −rk.
λ2G ∂t λG

where f is the vector of node values for the graph and ∇ G2 f = KK  f where K
is the incident matrix for the graph. Relabel the fraction τ G /λ2G by μ1 and the
constant 1/λ2G by μ2 where we drop the G label as it is understood from context.
Each computational graph model will have constants μ1 and μ2 associated with it
and if it is important to distinguish them, we can always add the labellings at that
time. This equation gives us the Laplacian graph based information flow model,
p. 279.

Matrix Feedforward Networks or MFFNs MFFNs are a specialized network


architecture which links computational nodes into a layered structure connected
by links or edges. The MFFN is then a function

F : Rn0 → RnM+1

that has a special nonlinear structure. The nonlinearities in the FFN are contained
in the neurons which are typically modeled by the sigmoid functions,

σ(y) = 0.5 1 + tanh(y) ,

typically evaluated at

x−o
y= ,
g

where x is the input, o is the offset and g is the gain, respectively, of each neuron.
We can also write the transfer functions in a more general fashion as follows:

x−o
σ(x, o, g) = 0.5 1.0 + tanh
g

The feed forward network consists of M + 1 layers of neurons connected together


with connection coefficients. For each i, 0 ≤ i ≤ M + 1 − 1, the ith layer of ni
Glossary 549

neurons is connected to the i + 1st layer of ni+1 neurons by a connection matrix,


T i . The feed forward network processes an arbitrary I ∈ Rn0 in the following way.
First, for 0 ≤  ≤ M + 1 and 1 ≤ i ≤ n and layer , to help us organize the
notation, we let

σi denote the transfer function, Yi denote the output,


yi denote the input, Oi denote the offset,
yi −Oi
gi denote the gain for the ith neuron and Xi denote gi
.

The inputs are processed by the zeroth layer as follows.

yi0 = xi
Xi0 = (yi0 − Oi0 )/gi0
Yi0 = σi0 (Xi0 )

At the layer  + 1, we have

n −1
yi+1 = Tji Yj
j=1

Xi+1 = (yi+1 − Oi+1 )/gi+1


Yi+1 = σi+1 (Xi+1 )

So the output from the MFFN at the output layer M + 1 is given by p. 287.

nM+1 −1
yiM+1 = TjiM YjM
j=1

XiM+1 = (yiM+1 − OiM+1 )/giM+1


YiM+1 = σiM+1 (XiM+1 )

MFFN back propagation The backpropagation equations for a MFFN are simply
a recursive algorithm for computing the partial derivatives of the MFFN error or
energy function with respect to each of the tunable parameters. The indicial details
for a given training set are fairly intense, so we just refer you to the lengthy and
detailed discussion in Sect. 16.3, p. 290.
MFFN training problem With the usual MFFN notation, for any I ∈ Rn0 , we
know how to calculate the output from the MFFN. It is clearly a process that
flows forward from the input layer and it is very nonlinear in general even though
it is built out of relatively simple layers of nonlinearities. The parameters that
control the value of F(I) are the ni ni+1 coefficients of T i , 0 ≤ i ≤ M; the offsets
550 Glossary

Oi , 0 ≤ i ≤ M + 1; and the gains g i , 0 ≤ i ≤ M + 1. This gives the number of


parameters,

M M+1
N= ni ni+1 + 2 ni .
i=0 i=0

Now let
 
I = Iα ∈ R : 0 ≤ α ≤ S − 1
n0

and
 
D = Dα ∈ R nM+1
:0≤α≤S−1

be two given sets of data of size S > 0. The set I is referred to as the set of
exemplars and the set D is the set of outputs that are associated with exemplars.
Together, the sets I and D comprise what is known as the training set. The training
problem is to choose the N network parameters, T 0 , . . . , T M , O0 , . . . , OM+1 ,
g 0 , . . . , g M+1 , such that we minimize,

S−1
E = 0.5 F(I α ) − Dα 2,
2

α=0
S−1 nM+1−1 2
= 0.5 M+1
Yαi − Dαi
α=0 i=0

where the subscript notation α indicates that the terms correspond to the αth
exemplar in the sets I and D, p. 290.

Sigma Three Transition Second messenger events have consequences that can
be written in terms of concatenated sigmoid transitions. In many cases of interest,
there are three such transformations and this is then called a σ3 transition. We use
the notation
Glossary 551

σ3 ([T0 ], [T0 ]b , gp ; inner most sigmoid r; scale innermost calculation by r


[T1 ]n ; scale again by [T1 ]n this is input to next sigmoid
r[T1 ]n
2 , ge ; offset, gain of next sigmoid s; scale results by s
[T1 ]n ; scale again by [T1 ]n this is [P(T1 )]
this is input into last sigmoid
0, gNa ; offset, gain of last sigmoid)

where the meanings of the various terms, if not self explanatory, can be found in
Chap. 8. Thus, the typical sodium maximum conductance gNa computation can
be written as p. 121.

r[T1 ]n
gNa (t, V ) = gNa
max
1 + eδNa h3 ([T0 ], [T0 ]b , gp ; r; [T1 ]n ; , ge ; s; [T1 ]n ; 0, gNa )
2
p q
MNa (t, V )HNa (t, V )

Time dependent cable solution The full cable equation is

∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c ke
∂z 2 ∂t

Recall that ke is current per unit length. Next convert this equation into a diffusion
model by making the change of variables y = λzc and s = τtm . With these changes,
space will be measured in units of space constants and time in units of time
constants. We then define the a new voltage variable w by

w(s, y) = vm (τm t, λc z)

This gives the scaled cable equation

∂2w ∂w
=w+ − ro λ2c ke (τm s, λc y)
∂y2 ∂s

Make the additional change of variables

(s, y) = w(s, y) es

and we find

∂2 ∂
= − r0 λ2c τm ke (τm s, λc y) es
∂y 2 ∂s
552 Glossary

Then apply to Laplace Transform to both sides to obtain

∂2 ∂
L =L − r0 λ2c L (Pnm (τm s, λc y) es )
∂y 2 ∂s
∂ 2 L ()
= β L () − (0, y) − r0 λ2c τm L (ke (τm s, λc y) es )
∂y2

Now further assume

(0, y) = 0, y = 0

which is the same as assuming

vm (0, z) = 0, z = 0;

a reasonable physical initial condition. This gives us

∂ 2 L ()
= β L () − r0 λ2c τm L (ke (τm s, λc y) es )
∂y2

Now apply the Fourier Transform in space to get

∂ 2 L ()
F = β F (L ()) − r0 λ2c τm F (L (ke (τm s, λc y) es ))
∂y2
or

−ξ 2 F (L ()) = β F (L ()) − r0 λ2c τm F (L (ke (τm s, λc y) es ))

For convenience, let

T () = F (L ())

we see we have

−ξ 2 T () = β T () − r0 λ2c τm T (ke (τm s, λc y) es )

The rest of the argument is messy but straightforward. The current ke is replaced
by a sequence of current pulses Pnm and limit arguments are used to find the value
of T (ke (τm s, λc y) es ). The limiting solution for a impulse current I0 then satisfies

r0 λ c I0
−ξ 2 T () = β T () − r0 λc I0 =⇒ (ξ 2 + β)T () = √

Glossary 553

We then apply the Laplace and Fourier Transform inverses in the right order to
find the solution

(s, y) = (r0 λc )I0 λc P0 (x, t)

We can then find the full solution w since

w(s, y) = (s, y) e−s


1 y2
= r0 λ c I0 √ e− 4s e−s
4πs

We can write this in the unscaled form at pulse center (t0 , z0 ) as

1 ((z−z0 )/λc )2

vm (t, z) = r0 λc I0 √ e 4((t−t0 )/τm ) e−(t−t0 )/τm )
4π ((t − t0 )/τm )

Note this solution does not have time and space separated!, p. 45.

Würfelspiel data matrices There is an 18th century historical idea called The
Musicalisches Würfelspiel. In the 1700’s, fragments of music could be rapidly
prototyped by using a matrix A of possibilities. It consists of P rows and three
columns. In the first column are placed the opening phrases or nouns; in the third
column, are placed the closing phrases or objects; and in the second column, are
placed the transitional phrases or verbs. Each phrase consisted of L notes and the
composer’s duty was to make sure that any opening, transitional and closing (or
noun, verb and object) was both viable and pleasing for the musical style that
the composer was attempting to achieve. The Würfelspiel matrix used in musical
composition has the general form below.
⎡ ⎤
Opening 0 Transition 0 Closing 0
⎢ Opening 1 Transition 1 Closing 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥
⎣ .. . . ⎦
Opening P-1 Transition P-1 Closing P-1

and the Würfelspiel matrix used in painting composition has the form
554 Glossary
⎡ ⎤
Background 0 Midground 0 Foreground 0
⎢ Background 1 Midground 1 Foreground 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥
⎣ .. . . ⎦
Background P-1 Midground P-1 Foreground P-1

Thus, a musical or painting composition could be formed by concatenating these


fragments together: picking the ith Opening, the jth Transition and the kth Closing
phrases would form a musical sentence. Since we would get a different musical
sentence for each choice of the indices i, j and k (where each index can take on
the values 0 to P − 1), we can label the sentences that are constructed by using
the subscript i, j, k as follows:

Si,j,k = Opening i + Transition j + Closing k

Note that there are P3 possible musical sentences that can be formed in this manner.
If each opening, transition and closing fragment is four beats long, we can build
P3 different twelve beat sentences. In a similar way, we can assemble P3 painting
compositions, p. 209.
Index

A The process of abstraction, 5


A First Look At Neural Model Training theoretical modeling issues, 11
a first count of the number of nodes and What do we do with these tools?, 9
edges needed for an interesting neural Abstract Computation
model, 281 Second messenger input approxima-
an OCOS and FFP circuit, 277 tions, 83
applying the OCOS Laplacian and gradi- Ca++ control, 96
ent to specific node function data, 280 G protein mediated signal amplifica-
assume information flow through the tion, 100
OCOS follows a cable equation, 279 PK/U phosphorylates T1∼ to
implementing an OCOS circuit, 277 T1 /T1∼ P, 84
numbering the OCOS edges, 278 T1 /T1∼ P is tagged to create
some node values are forced to match T1 /T1∼ PVn , 84
given data: these are called clamped a few of the P(T1 ) creation kill
values, 279 switches, 85
the gradient of the node value function, a signal increases [CAMP] which
278 alters the activity level of the cell via
the graph cable equation applied to the the PKA pathway implying a small
OCOS circuit, 279 number of occupied receptors has
the Laplacian of the node value function, major influence on the cell, 100
278 a substance can be free or stored: this
the OCOS cable equation with clamped provides a regulatory mechanism,
values can be solved iteratively using 92
Hebbian ideas, 280 adding protein construction factory
the OCOS incidence matrix, 278 controls, 94
the OCOS Laplacian, 280 ADP/ ATP reactions, 99
the standard cable equation, 279 Calmodulin based control, 100
A Primer on Cognitive Science chemotaxic receptors and a general
Basic Goals, 3 abstract strategy, 103
Biological Information Processing, 4 contrasts between diffusion based
Chapter guide, 7 triggers and external signals, 104
evolvable hardware and synthetic biol- controlling using creation and
ogy, 4 destruction cycles, 94
Some specific models to answer high details of the dynamics in terms of
level questions, 11 forward and backward rate con-
SWH (Software, Wetware and Hard- stants, 86
ware) Triangle, 4 futile cycles, 96
© Springer Science+Business Media Singapore 2016 555
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7
556 Index

general organization of extracellular neurotransmitter input streams into a


agonists, 102 neuron using a global clock, 137
implications for biological computa- Reticular formation and monoamine
tion, 91 neurotransmitters, 137
in the PGDF system, many receptors Outputs
catch as many PGDF’s as possible, a discussion of BFVs parameters, 151
103 BFV distances to toxin classes
monoamine neurotransmitters and cleanly separate, 149
hardware/ software alteration of the BFV modulation via the Hodgkin
cell, 101 Huxley model, 156
multiplicative cascades, 90 connecting BFV function modulation
negative feedback allowing for deac- to neurotransmitter signals, 155
tivation of a pathway, 96 Defining the BFV as a vector, 141
rapid increases in cystolic Ca++ are dendrite - soma interactions could be
regulatory signals, 99 altered thereby changing the eigen-
tagging is removed to release T1 , 85 value spectrum of the ball stick
the T1 equilibrium conditions for Step model, 140
i as relative changes, 89 finding the hyperpolarization curve
the T1 equilibrium conditions for Step for the combined two BFV inputs,
iii, 90 163
the complex PK/U is formed by the finding the new g for the combined
trigger T0 , 84 two BFV inputs, 168
The differences in computational finding the new t3 for the combined
strategies between the TAR and two BFV inputs, 167
PDGF cases: different strategies as finding the new cap for the combined
there are different needs, 104 two BFV inputs, 163
the equilibrium conditions for Step i, Finding the partials of g with respect
87 to Hodgkin Huxley parameters, 161
the equilibrium conditions for Step ii, Finding the partials of V1 and V3 with
88 respect to Hodgkin Huxley parame-
the full creation pathway and the ters, 158
pathway with inhibitory steps, 85 Finding the partials of V4 with respect
the full effect of a signal is the rewrit- to Hodgkin Huxley parameters, 159
ing of the hardware and software of Handling two BFV inputs, 162
the cell, 101 modulation of the BFV parameter set,
the full event chain to create the pro- 155
tein P(T1 ) whose construction is modulatory inputs to the BFV can
triggered by the signal T0 , 85 alter the cap and cup shape and the
the general trigger T0 , 84 hyperpolarization curve, 155
The TAR pathway control actions, Review of the inputs that effect the
103 BFV, 142
there is a complimentary system to the α and β parameters in a Hodgkin
remove tagging, 84 Huxley model, 145
threshold computations using sig- the α and β perturbing toxin families
moidal functions, 92 generate cleanly separated action
Abstract neurons potentials, 150
Inputs the BFV components, 141
monoamine neurotransmitter influ- the BFV function formula, 154
ence is dependent on many shaping the Biological Feature Vector or BFV
parameters, 138 is introduced, 140
monoamine neurotransmitter modu- The BVF parameters, 147
lation of maximum ion conduc- the effects of toxins on the action
tances, 139 potential and BFV, 141
Index 557

the five toxin families that perturb α An evaluation loop, 445


and β parameters in the Hodgkin nodes use offsets and gains, 446
Huxley models, 150 Building a brain model graph
the functional form of the BFV: First evaluation, 448
parabolas and a sigmoidal piece, getting backward and forward infor-
152 mation, 447
the general shape of an action poten- getting the incidence matrix, 447
tial, 141 how the first 5 node values changed
the neurotransmitter parabola width after the Hebbian update, 449
modulation equation, 156 initial start, 448
the nominal Hodgkin Huxley para- Initializing parameters, 448
meter set which will be altered by a No details yet, 447
toxin, 146 node values after first evaluation, 448
the shaping of the action potential, Simple Hebbian update, 448
139 Building an OCOS Graph
the shaping of the action potential accessing the backward global infor-
could have global effects, 140 mation, 444
the voltage data determines the values Generating the associated dot file for
of the BFV’s vector, 148 graphing, 443
toxin families generate samples of Generating the OCOS graphic, 443
action potentials altered by the tox- Getting the backward sets, 443
ins, 147 the backward address information,
Toxin families which alter the sodium 444
and potassium conductances, 146 the backward global information, 443
toxins influence the Hodgkin Huxley Building an OCOS graph with inhibition,
parameters, 145 449
toxins that shape the α and β parame- a new type of hebbian learning using
ters of the Hodgkin Huxley model, an error signal, 452
149 a new type of Hebbian learning using
using a generic action potential to find the target updates, 454
the BFV parameter partials, 156 adding Hebbian learning tolerances,
the full abstract model, 169 452
Address Based Graphs explicitly setting some weights to be
inhibitory, 450
Asymmetric left/ right brain model
Hebbian update with inhibition, 452
Address details, 419
initialization, backward and forward
Building a cortical column, 417
sets, 450
Cortex Model, 418
inputs, 449
Cortical Column with 3 cans and
setup graph, 450
reversed OCOS thalamus, 420
Simple example training loop for
Incidence Matrix Discussion
inhibition, 459
address hashing, 432 Simple training example, 458
Global nodes and edge versus address targets, 449
needs, 432 using inputs to initialize node values,
The need for a dictionary or associa- 451
tive array, 432 The Hebbian update, 445
Input and Output added to brain model, Advice on MatLab and Octave, 14
419
Left and Right brain modules linked by
the corpus callosum, 419 B
Linking Cans into a column, 418 Basic Neurobiology
OCOS, FFP and Two/Three Address hemispheres and the corpus callosum, 61
blocks Giving a Can, 417 simplified information processing in the
Address Graphs brain
558 Index

a model of isocortex, 76 the OCOS cortical processing subcir-


associative cortical increases cuit, 77
occurred early in evolution, 64 the reticular formation, 72
brain stem structure shown through the rostral pons and caudal and rostral
seven slices, 70 midbrain, 72
connections from the limbic areas to three main neurotransmitter path-
cortex are not visible and must be ways: dopamine, serotonin and
seen through slices, 69 norepinephrine, 64
cortex cell types, 77
cortex is uniform prior to external
input, 67 C
cortical layers form columns, 77 CFFN
cortical structure: occipital, frontal, General discussion
parietal, temporary and limbic Backward and Forward sets, 316
lobes, 74 Cardinality of parameter sets, 318
dopamine and serotonin sites, 74 Gradient descent, 319
dopamine innervation pathways, 73 handling Output - Target Data, 318
evolutionary changes that have Input and Output Data, 318
encouraged symbolic information Input and output index sets, 316
processing in the human brain, 62 Pre and Post synaptic inputs, 315
input fusion in the associative cortex, The Energy minimization problem,
62 319
limbic processing details, 64 The evaluation process, 317
main cortical areas of the brain, 62 The training problem, 318
major limbic systems, 62 Variable setup, 317
many modulatory connections from Gradient Calculations
reticular formation to cortex, 73 Tj→i partials, 327
∂Y j
nerve bundle connections for infor- ∂Y i
Calculation, 326
mation processing, 62 dE
Calculation, 326
dY i
norepinephrine innervation path- Energy calculation, 325
ways, 81 Error calculations, 325
raw inputs, 61 Full partial algorithm for general
serotonin innervation pathways, 74 input, output set, 327
simplified thalamic inputs to the cor- Full partial algorithm for one sample,
tex, 79 327
symbolic processing closely linked to Gain and offset partials, 327
development of associative cortex, General dependence on parameters,
63 326
the basal ganglia, 64 Identity connecting forward and
the basic neural building blocks for backward link sets, 325
the brain, 65 node evaluation as a function of para-
the brain stem, 69 meters, 324
the caudal and mid pons, 70 Node functions, 324
the caudal medulla and rostral Chain Derivatives
medulla, 70 General discussion
the FFP and OCOS combine into a Adding parameters to our E and Y 2
multicolumn model, 79 example, 323
the FFP cortical processing subcir- Difference between a total and direct
cuit, 77 derivative, 321
the Layer Six–four to Layer Two– Direct dependence of E on Y 0 , 321
Three cortical processing subcir- Example Setup, 320
cuit: i.e. the Two/Three subcircuit, General Recursive Total dependence
79 calculation, 322
Index 559

General recursive total dependence Connecting paths and binomial coef-


calculations when parameters are ficients, 24
included, 323 Connecting random walks and the
Total dependence calculation, 320 binomial distribution, 22
Total Dependence of E on Y 0 , 321 Counting right and left hand moves,
Total Dependence of E on Y 2 handled 23
recursively, 321 Diffusion constant in Ficke’s Law
Total derivative of Y 2 , 320 of Diffusion connected to brown-
Cognitive Dysfunction Models ian motion in terms of the time and
Generating dysfunction space constant of the random walk,
depressive states, 513 21
lesion studies, 514 For large number of steps we apply
modeling corpus callosum miscom- Stirling’s approximation, 29
munication issues, 514 Interpreting the probability density
How do we answer the big questions? function as the solution to the
Some examples of good questions., 495 diffusion model with an unbound
The basic normal brain design impulse injection, 35
emotional assignment equation, 512 obtaining the limiting probability
Laplacian module updates, 501 density distribution, 30
Laplacian nodal processing, 500 specializing to the probability of right
Laplacian update for multiple mod- and left particle movement being
ules, 504 the same, 28
Laplacian updates using a cable equa- The expectation of the particle distri-
tion paradigm, 499 bution in space and time, 26
Lesion studies, 519 The probability a particle will be at
Normal brain models, 508 a discrete time and position in the
our model captures connectivity and walk, 23
topology, 498 The probability density function
Simplistic edge processing, 498 solves the diffusion model, 34
summary of build process, 513 The probability distribution as the
The auditory data path, 509 number of steps grows without
The auditory data path: neurotrans- bound, 28
mitter choices., 509 The probability of the particle mov-
The base graph and additions to it as ing to the left and right is not the
subgraphs, 497 same, 24
The basic training algorithms, 499 the standard deviation of the particles
The final Laplacian module update distribution in space and time, 28
equations, 504 understanding the probability density
the final Laplacian update equations distribution, 33
for a second module, 508 Using a standard ln approximation,
The final Laplacian updates, 506 30
Brownian motion
the Laplacian update equation, 500
particle flux, 20
The visual data path, 512
random walks, 19
the Würfelspiel data abstraction, 497
using the music and painting data
sets, 496 E
The design of a minimal brain model, 496 Emotional Models
The need for a normal brain model, 495 a broad plan for an emotionally enabled
avatar, 175
classification
D Primary emotions are outputs of the
Diffusion Models reactive layer and the global alarm
Brownian Motion process, 179
560 Index

emotionally labeled data Node evaluation algorithm needs back-


emotionally charged compositional ward set information, 356
data design, 181 Second OCOS Hebbian update, 359
physiological responses to emotion- Simple OCOS Hebbian update loop, 359
ally labeled data can be organized testing the energy code, 364
into a two dimensional space: one Testing the first evaluation implementa-
axis is skin galvanic response and tion code, 362
the other is the fMRI parameter, 180 the energy calculation, 363
skin galvanic response and a fMRI the first gradient update code implemen-
parameter change when peo- tation, 366
ple look at emotionally charged the first implementation of the evaluation
images, 180 code, 361
enabling architectures the second energy calculation code
Secondary emotions are outputs of implementation, 373
the deliberative layer, 179 the second evaluation code implementa-
tertiary emotions are outputs of the tion, 373
meta management layer, 179 the second gradient update code, 374
the hybrid approach which adds feed- the second training loop code implemen-
back pathways and finally, adding tation, 376
meta management, 178 the training for the approximation of cos
the three layer approach: adding rea- on 0, 6.28], 377
soning, 177 Updating the BFsets code to return edge
the three tower approach: perception, information, 365
central processing and action, 177 Graph Models
implementing these architectures General Discussion, 333
requires asynchronous approaches, Computations are asynchronous and
175 use BFV data, 333
two basic assumptions, 175 Computations need immutable
objects which suggests Erlang, 337
General estimates of cognitive model
G components, 337
Global Graph Class Generic software architecture design,
20 step Hebbian loop for the OCOS, 359 334
A Hebbian update scheme, 358 No memory model right now, 336
a sample training session to model cos on node size estimates for a general
[0, 3] using a 1-5-1 CFFN, 369 graph model, 336
A simple evaluation code, 357 Handling Feedback
a training loop using the first gradient cal- Adding column to column connec-
culation code, 369 tions in the OCOS/FFP lagged
An OCOS Hebbian update, 358 architecture, 384
another variant of the node function ini- The Feedback evaluation and Heb-
tialization which allow us to set the low bian Update Algorithms, 383
and high values of node functions, 372 The OCOS/FFP as a lagged architec-
code to initialize the node functions, 371 ture, 384
Defining the partials of the node func- The OCOS/FFP example, 381
tions, 361 Using Lags to handle feedback, 383
Finding backward and forward sets from
the incidence matrix, 356
finding which component of the gradient L
is the largest, 366 Laplace Transform
First OCOS Hebbian update, 359 Applied to a derivative, 39
initializing a training session for cos on Definition, 39
[0, 6.28] using the new code, 377 example of its use, 40
Index 561

the inverse Laplace transform definition, Address based edges subsref, 423
40 address based graph constructor
graph, 429
address based graph constructor
M graph subsref, 429
MatLab address based incidence, 438
2-3-1 CFFN address based laplacian, 441
Evaluation, 330 Address based vertices, 420
Initialization, 329 Address based vertices subsref, 421
2-3-1 MFFN Address to global utility function:
Evaluation, 299 code Address2String, 434
Initializing and finding the error, 299 Address to Node lookup, 432
Initializing the nodefunctions, 298 Address2Global, 434
The error loop, 298 Address2Global New, 437
2-3-1 MFFN Example Address2String, 435
initializing the nodal functions, 298 Another way to convert an address
The energy or error loop, 298 to a global node number: code
Address Based Graphs Address2Global: A form of hash-
A Evaluation Loop With Offsets and ing, 434
Gains, 446 Backward Sets need both global and
A new incidence matrix calculation, address information, 441
436 BFsets, 441
A Simple Evaluation Loop, 445 Building the incidence matrix with
Accessing .l information, 428 Straight linear searches which are
Accessing .n information, 427 expensive, 433
Accessing .v information, 426 calling Address2Global, 437
add a node list address based vertices converting addresses to unique hash
addv, 422 values, 436
add a single edge address based edge Evaluation With Input Data, 451
add, 424 Find OCOS global node numbers,
add a single node with address based 427
vertices add, 422 Find the maximum number of nodes
add an edge list address based edges on each level: code GetAddress-
addv, 424 Maximums, 435
Adding a list of nodes to an existing FindGlobalAddresses, 435
graph, 431 Finding the maximum number of
Adding a mask or location to graph nodes per level, 435
edges, 430 General discussion, 417
Adding a mask or location to graph GetAddressMaximums, 435
nodes, 430 getting address slot maximums, 436
Adding a single edge to an existing Hebbian Training With an Error Sig-
graph, 431 nal, 452
Adding a single node to an existing HebbianUpdateErrorSignal, 455
graph, 431 Incidence Matrix, Backward and For-
address based addedge, 431 ward Data and Initialization, 450
address based addedgev, 431 Initialized the Neuron Data, 452
address based addlocationtoedges, Inside .n information, 427
430 Inside the .l information; node data,
address based addlocationtonodes, 428
430 Inside the .l information; the edge
address based addnode, 431 data for a node, 428
address based addnodev, 431 Inside the .l information; the in and
Address based edges, 423 out edge data for a node, 428
562 Index

Iteration count for the link2global Training: mffntrain.m, 308


lookup, 433 Updating: mffnupdate.m, 304
link2global, 432 Global Graph Classes
Multistep Training, 458 A simple simulation: DoSim, 359
OCOS addresses after updated loca- adding a graph to an existing graph,
tion, 426 351
OCOS incidence matrix, 440 adding a node in a global graph, 346
OCOS incidence matrix calculation, adding a node to a vertices object, 346
439 adding a node to existing vertices
OCOS incidence matrix calculation: object, 346
A look at the address to global adding a single edge to an edge
details, 439 object, 345
OCOS incidence matrix calculation: adding a single node to a vertices
Associative array details, 440 object, 346
OCOS Laplacian, 441 adding an edge in a global graph, 346
Overview of new Graph class, 425 adding an edge to a graphs object, 346
Preliminary Evaluation with Input, Adding an edge to an existing edge,
451 345
Setting global locations, 425 Adding another graph to an existing
setting up the incidence matrix is graph, 351
costly, 438 adding the FFP information to the
Setting Up the Input Data, 449 OCOS graph, 349
Setting Up Training Data, 449 Adding upper and lower bounds to
Setup OCOS Graph, 450 the node function initialization, 372
Straightforward incidenceOld, 433 Building a graph from two OCOS
Target Updates, 454 modules, 353
the actual B matrix, 440 Building the OCOS graph, 348
Training for 151 Steps, 459 Chain FFN Training Code, 369
Update loop in skeleton form, 452 constructing the incidence matrix of
updated node 3 address, 426 a graph, 347
Brain Models constructing the Laplacian of a graph,
Connections Thalamus to cerebel- 347
lum, 485 CoolObject, 338
General MFFN CoolObject directory, 338
A Vector Conditional Test Function, CoolObject: a typical p() element,
309 339
Code fragment to initialize nodal CoolObject: atypical p.c element,
functions and their derivatives, 304 339
Evaluation code: mffneval2.m, 302 CoolObject: Outline of a typical sub-
General sigmoid nodal computation sref function, 339
function, 303 Doing a descent step, 407
General sigmoid nodal computation dot command to generate a graph’s
function partial derivatives, 303 image, 354
Initialization code mffninit.m, 301 edges, 342
Plotting the results of the square wave edges constructor, 342
test, 310 edges example, 343
Testing the code on a 1-5-1 MFFN, edges subsref overloading, 342
309 Example for updated lagged training
Testing the code on a step function code: setup, 411
runtime results, 309 Example for updated lagged training
Testing the code on example sin2 , 311 code: testing, 414
Testing the code on example sin2 Example for updated lagged training
using linear output sigmoids, 312 code: the training, 413
Index 563

Find the location of the absolute max- The updated energy code: energy2.m,
imum of a vector, 366 373
Finding backward and forward sets: vertices, 340
BFsets0, 356 vertices constructor, 340
finding eigenvalues and eigenvectors vertices example, 341
of the OCOS and FFP Laplacians, vertices subsref overloading, 341
350 Lagged CFFN
finding the incidence matrix for the Classification code: GetRecognition,
OCOS and FFP graph, 349 400
finding the Laplacians for the OCOS Converting the incidence matrix for a
and FFP graphs, 350 CFFN with feedback into a dot file,
Finding the scaled gradient, 406 385
generating OCOS.pdf via dot, 355 Example: 10 training steps, 399
generating the dot file from the inci- Example: 10 training steps gradient
dence matrix, 354 information, 399
Gradient Update Code, 366 Example: A simple evaluation and
Graph evaluation code: evalua- energy calculation, 398
tion2.m, 361 Example: Further setup details and
graphs constructor, 344 input and target sets, 397
graphs subsref overloading, 344 Example: Setting up the lagged
graphs: building the picture using dot, graph, 397
344 Example: Setting up the original
Laplacian for global graph, 347 graph with feedback, 396
Node function Initialization, 371 Example: Setup parameters and get
backward and forward information,
Returning forward edge information:
398
Changed BFsets code, 365
Finding the feedback edges, 387
Set up gradient vectors, 406
General comments on training the
sigmoid, 357
lagged architecture, 389
Simple incidence matrix for global
Incidence matrix for CFFN with feed-
graph, 347
back, 385
The altered training loop code:
New Gradient Update Code for
chainffntrain2.m, 376
Lagged Networks: Activating Line
the edge subsref overloading, 342 Search and Gradient Scaling, 402
The energy calculation: energy.m, New GradientUpdateLag skeleton
363 code, 402
the evaluation, 357 New Lagged Gradient Update Code:
the generated dot file for the OCOS GradientUpdateLag.m, 393
graph, 355 sample classification results, 401
the graphs class, 344 Setting up graph for the OCOS/FFP
The Hebbian code: HebbianUp- with column to column feedback,
dateSynapticValue, 358 387
the incidence to dot code: incToDot, Subtract the feedback edge and con-
354 struct the double copy, 389
The new evaluation function: evalua- testing the incidence matrix for a
tion2.m, 373 CFFN with feedback and its dot file,
the new training code, 411 386
The new update code, 409 The edges object subtract edges func-
The new update code: GradientUp- tion, 388
date2.m, 374 The graphs object subtract edges
The regular gradient descent step, function, 388
408 The lagged training code, 396
The scaled gradient descent step, 407 Thresholding code: Recognition, 400
564 Index

Testing the CFFN and MFFN setting up links from thalamus to


Initializing the MFFN node func- associative and motor cortex, 485
tions, 379 setting up links from thalamus to
Setting up the test, 379 cerebellum, 485
Testing the CFFN and MFFN on the same setting up the dopamine links, 486
problem, 379 setting up the norepinephrine links,
Training the lagged network 488
Extracting the feedback edge indices setting up the serotonin links, 487
from the incidence matrix, 391 buildcan
Extracting the feedback edge pairs Adding the OCOS, FFP and Two/
and relabeling them, 391 Three component connections, 463
Removing a list of edges from a Combining the OCOS, FFP and Two/
graph, 391 Three edges into CanOne, 463
Removing a list of edges from an Combining the OCOS, FFP and Two/
edges object, 390 Three nodes into CanOne, 463
MatLab Brain Models Generating the dot files for the
Build A Cortical Can, 461 CanOne components and CanOne
buildbrain itself, 464
adding all the intermodule links, 486 the OCOS, FFP and Two/ Three cir-
adding intermodule connections, 482 cuit blocks, 462
building nodes and edges for the neu- buildcolumn
rotransmitter objects, 475 Add the CanOne, CanTwo and Can-
building the associative cortex mod- Three nodes into Column, 465
ule, 478 build CanOne, CanTwo and Can-
building the cerebellum module, 480 Three, 464
building the first sensory cortex mod- Add the CanOne, CanTwo and Can-
ule, 478 Three edges into Column, 465
building the midbrain mod- Add the connections between
ule: dopamine, serotonin and CanOne, CanTwo and CanThree,
norepinephrine neurotransmitters, 466
479 Adding the Nodes, 465
building the model, 489 generating the column dot file, 466
building the motor cortex module, buildcolumn: Building a Column out of
479 Cans, 464
building the second sensory cortex buildcortex
module, 478 adding the edges of the columns into
building the thalamus module, 479 the cortex, 469
general structure of code, 477 adding the intercolumn connections,
generating the dot file, 489 469
glueing the individual edges of the adding the nodes of the columns into
modules together, 481 the cortex, 469
glueing the individual nodes of the generating the dot file, 470
modules together, 480 gluing columns into a sheet, 467
setting up 10 links from thalamus to the code for a matrix of columns, 468
sensory cortex one, 484 buildmidbrain
setting up 9 links from associative building neurotransmitter node and
cortex to motor cortex, 484 edge lists, 474
setting up links from cerebellum to generating the dot file, 476
motor cortex, 485 glueing the neurotransmitter blocks
setting up links from sensory cortex together, 476
one to associative cortex, 483 buildthalamus
setting up links from sensory cortex add the ROCOS edges into the thala-
two to associative cortex, 484 mus, 473
Index 565

add the ROCOS nodes into the thala- music and painting data provide data
mus, 472 for validation of a brain model, 227
building the reverse OCOS module, musical and painting data constrain
471 the associative cortex and limbic
building with more than one ROCOS system model, 229
block, 472 musical and painting data constrain
Construct the ROCOS blocks, 472 the associative cortex model, 228
generating the dot file, 473 musical and painting data contains
Matrix FFN information about good composi-
A Simple sigmoid nodal processing func- tional design, 229
tion, 287 the cortex model, 228
A Simple sigmoid nodal processing func- Connectionist based Design
tion with offsets and gains, 288 alphabet discussions for the musical
Definition, 288 and painting data, 231
General Discussion, 287 generating streams of sentences, 237
Inner layer processing, 288 mapping process noun N feature vec-
Input layer processing, 288 tors into verb V feature vectors: the
Minimizing the MFFN energy uses gra- map gN V , 233
dient descent, 290 mapping process verb V feature vec-
Output layer processing, 289 tors into object O feature vectors:
partial computations in the layers before the map gV O , 233
the input layer, 292 Noun to Verb processing, 234
The ξ M+1 computations and the M + 1 Noun to Verb processing: acceptable
layer offset and gain partials, 292 letter choices in a noun, 234
The full partial computations for all lay- Noun to Verb processing: building
ers, 295 a map for acceptable noun letter
The gain and offset partials for layer M, choices using Hebbian training, 235
294 Noun to Verb processing: random
the offset and gain partials for the input assignment of next noun letter
layer, 295 choice from the possibilities, 236
The output layer partial computations: Noun to Verb processing: the idea of
weighting terms Tpq M ., 291
a noun generating map, NGM, 234
The recursive ξ M computations, 294 Noun to Verb processing: the pro-
The sum of squared errors energy func- cedure to create a valid noun
tion, 290 sequence, 235
The training problem, 290 Noun to Verb processing: the proce-
Three Layer Example dure to create a valid noun sequence
Hidden layer partial computations, generates the NGM, 236
297 Noun to Verb processing: the pro-
Input layer partial computations, 297 cedure to create a valid object
Output layer partial computations, sequence generates the OGM, 236
296 Noun to Verb processing: the proce-
Variables needed to describe MFFN dure to create a valid verb sequence
computation, 288 generates the VGM, 236
Modeling Compositional Design preprocessing a noun vector n into a
basic ground rules, 227 noun feature vector N, 232
a music or painting initiation gen- preprocessing a verb vector v into an
erates fMRI and skin conductance verb feature vector V , 233
information and a musical or paint- preprocessing an object vector o into
ing composition, 229 an object feature vector O, 233
an abstract model of limbic system raw sentence data into feature vec-
in biological information process- tors, 233
ing to follow, 228 sentence construction, 236
566 Index

Verb to Object processing is handled a neurotransmitter has an associated


similar to the Noun to Verb process- locality set, 262
ing. This gives the map gV O ., 236 a typical soma with dendritic and axonal
Würfelspiel matrices uses for input processing, 261
data, 231 activity level of a neurotransmitter, p, 262
Neurobiology based Design an abstract version of neurotransmitter
a model of sentence construction, 240 pathways, 259
a model of sentence construction: the dendrite and axon processing via the
letter to next letter mappings are PSD, 261
produced by OCOS and FFP inter- Neural Architectures
actions, 240 a careful look at neuron 0 in an exam-
a three cortical column model, 239 ple, 272
adding more cortical columns to gen- a specific chain architecture example,
erate the Noun to Verb and Verb to 269
Object maps, 241 a step by step computation for multi-
adding some timing information to ple time steps, 273
the cortical training procedure, 242 adding feedback to the chains, 264
cortical training sequences for the chain computation is sigmoidal, 264
sensory data maps, 241 chained architectures can be compli-
expressing the letter to next letter cated, 267
for noun map in terms of cortical chains have specific input and output
columns, 240 nodes, 266
expressing the letter to next letter chains of computational nodes, 264
for object map in terms of cortical details of dendrite - axon interactions
columns, 241 showing PSDs, 271
expressing the letter to next letter different parameter choices for the
for verb map in terms of cortical ball stick model give rise to differ-
columns, 241 ent neuron classes, 272
integrating the models into a 3D vir- each node in a chain has a set of back-
tual world, 245 ward element and forward element
lesion studies, 247 sets, 267
Reviewing the Würfelspiel data outputting a BFV if the depolariza-
matrices approach, 238 tion is above a threshold, 275
the antagonist hypothesis for model- standard chained architecture nota-
ing depression, 244 tion, 268
the cognitive model, 248 the chained architecture evaluation
the cortical maps for emotional loop, 268
labellings for music and painting the computational scheme for a given
data, 242 node in the chain, 269
the monoamine hypothesis for mod- the parameters that influence the out-
eling depression, 244 put from the ball stick model, 272
the second messenger defect hypoth- the typical chained input calculation,
esis for modeling depression, 244 267
training the isocortex model, 239 nonlocal interaction is important, 260
virtual world construction, 249 PSD computations, the • operation, 263
reabsorption rate of a neurotransmitter,
q, 262
N the value of a dendrite - axon interaction,
Neural System Modulation Issues 263
a catecholamine abstraction plan, 261 the value of a dendrite - axon interaction
a neurotransmitter can play simultane- with locality information included, 263
ous roles that are different in terms of the value of a dendrite - axon interaction
behavior and cognitive function, 254 with multiple neurotransmitters, 264
Index 567

the value of a dendrite - axon interaction redefined diffusion constant for cal-
with pools of local axons contributing, cium dynamics when calcium bind-
263 ing is fast, 111
Neurotransmitters storage pool binding details, 108
catecholamine class total calcium equation boundary con-
dopamine, 252 ditions, 110
epinephrine, 252 total calcium, bound and unbound,
norepinephrine, 252 dynamics, 109
share common core biochemical total the calcium binding is fast
structure, 251 assumption, 110
the catechol group, 251 Second Messengers
the catechol molecule with a ethyl Design Principles, 117
side chain, 252 conversion of [P(T1 )] into a change
the catecholamine synthesis pathway, in gNa
max as a sigmoidal process, 120
253 general pharmacological inputs, 130
Non separable Time dependent cable equa- general pharmacological inputs:
tion solutions allosteric modulation, 133
Using the idea of diffusion to give insight, general pharmacological inputs: four
19 transmembrane regions, 132
general pharmacological inputs: gen-
eral receptor families, 131
S general pharmacological inputs:
Second Messenger Diffusion Pathways seven transmembrane regions, 131
Ca++ ion movement, 107 general pharmacological inputs:
a trigger event initiates an increase in seven transmembrane regions and
a calcium buffer, 114 first messenger reshaping of bio-
calcium diffusion in the cytosol, 107 logical structure of post cell, 131
calcium dynamics boundary condi- general pharmacological inputs: the
tions when calcium binding is fast, agonist spectrum, 132
111 modeling neurotransmitter modula-
calcium dynamics boundary condi- tors, 133
tions when the binding rate is modeling neurotransmitter multiplier
smaller than the dissociation rate, equations, 134
113 modeling neurotransmitters could
calcium dynamics has new diffusion increase calcium currents, 135
constant when the binding rate is modeling port activity with sigmoidal
smaller than the dissociation rate, transitions, 119
113 second messenger Ca++ activity
calcium dynamics when calcium alters maximum ion conductances,
binding is fast, 111 129
calcium dynamics when the binding second messenger Ca++ activity
rate is smaller than the dissociation with feedback, 129
rate, 112 the Ca++ triggers, 125
concentration of occupied binding the Ca++ triggers and their feedback
sites depends on calcium concen- term estimates, 128
tration, 111 the Ca++ triggers as second messen-
first order approximations for a trig- gers, 128
ger event initiates an increase in a the Ca++ triggers effect on the
calcium buffer, 115 sodium conductance pathway, 125
free calcium ion dynamics, 107 the Ca++ triggers with spatial depen-
new calcium dynamics equation dence, 126
when the binding rate is smaller the dendrite - soma model and its
than the dissociation rate, 113 inputs, 135
568 Index

the full Ca++ triggers with spatial attributes attached to music for vari-
dependence equations, 126 ous emotional states, 191
the pathway to control changes in choosing the emotional musical
maximum conductance, 118 alphabet, 201
the second and third level sigmoidal comments of neutral data, 195
transformations as computational comments on angry data, 196
graphs, 123 comments on happy data, 196
the sodium conductance sigma3 tran- comments on sad data, 196
sitions, 121
deviations from pure tones, perfect
the sodium conductance modifica-
harmony convey emotion, 192
tion equation in compact form and
sigmoid processing concatenation, emotional communication in perfor-
121 mance, 193
the sodium conductance modifica- generating sad, angry and happy data,
tion equation in compact form with 191
feedback, 123 how guitarists express emotional
the third order sigmoidal transforma- attributes, 194
tions as computational graphs, 122 is the emotional content of music uni-
the trigger initiates a change in max- versal? Can someone from another
imum potassium conductance, 120 culture identify emotion in an unfa-
The trigger initiates production of miliar culture’s music?, 192
sodium gates, 117 phrase encodings using the alphabet,
Software Implementation Ideas 202
a core object could be a neurotransmitter, some angry phrases, 200
254 some happy phrases, 196
a possible software architecture blue- some sad phrases, 198
print, 258 the angry data, 199
coding synaptic interaction, 259 the design of the data for opening,
dendrite - axon interactions via an inter- transition and closing phrases, 195
mediary agent, 254 the effects of emotion on pitch and
discussions of self modifying objects: rhythm, 191
communication alters structure, 257 the happy data, 196
how is CA plasticity managed?, 254
the punctuation marks in the musical
neurotransmitters have both local and
grammar, 201
global scope issues, 255
the sad data, 197
processing by the architecture involves
combinatorial strategies, 258 timing patterns in the encoding of
some molecules have multiple functions emotion in music, 194
and hence, some software objects must grammatical clauses and phrases with
play that role also, 256 adjectives, 183
the architecture should be event based; music in terms of smaller blocks and their
i;e. asynchronous in nature, 257 role in overall function, 183
musical nouns, verbs and objects, 184
neutral Würfelspiel matrix design, 187
T correct neutral nouns are not random
The design of emotionally labeled musical choices but require a composer’s
data skill, 187
cadences are punctuation marks, 184 generated neutral musical fragments
connections to vocal melody and poetry, of 12 notes each, 189
185 neutral nouns, 187
emotional Würfelspiel matrix design the closing phrases, 188
adding octaves to the emotional musi- the complete Musicalisches Würfel-
cal alphabet, 214 spiel matrix, 189
Index 569

the construction of the neutral gram- encoding the neutral data, 212
mar gives clues about the order in painting edges, 213
which notes appear, 189 The assembled neutral paintings, 211
the middle phrases, 188 the neutral data matrix, 210
the idea of a musical grammar, 183 the Qait painting
the Würfelspiel matrix allows different abstract design details, 207
noun, verb and object combinations the Qait painting from background to
creating many different musical sen- foreground, 207
tences, 186 the sad Würfelspiel matrix
the Würfelspiel matrix uses opening, a typical sad painting assembled from
transitions and closings are the nouns, background, midground and fore-
verbs and objects in a musical sentence, ground images, 222
186 the assembled sad paintings, 222
The design of emotionally labeled painting the seadragon and Qait paintings
data the three element abstract design, 207
data sets to train the visual cortex, 205 the seadragon painting
developing a painting model: back- abstract design details, 206
ground first, then middle layer and the seadragon painting from background
finally the foreground details, 205 to foreground, 206
painting design Würfelspiel matrices
simplified paintings are assembled The music data matrix, 209
using background, then midground The painting data matrix, 209
and finally foreground, 208 The Fourier Transform
the emotional Würfelspiel matrix Applied to a derivative, 41
artists have written about their use of Definition, 41
emotion, 218 the inverse Fourier Transform definition,
capturing emotion in a painting, 216 42
curve quality and emotional using the inverse Fourier transform, 42
attributes, 217 The graph subsref overloading, 344
distinguishing beauty and ugliness, The Time Dependent Cable Solution
217 Solving directly using transforms, 45
emotion is evoked by differing qual- applying the T transform to the pulse
ities of the force perceived in a pen family, 50
or brush stroke, 217 applying the Fourier transform in
portraying face and body language, space, 50
218 applying the Laplace transform in
the use of expression by Matisse, 219 time, 49
the use of light and color, 217 choosing the current family normal-
the happy Würfelspiel matrix ization constant, 47
a typical happy painting assem- Converting the cable model into a dif-
bled from a given background, fusion model with a change of vari-
midground and foreground, 220 ables, 48
the assembled happy paintings, 220 Defining the T transform, 50
the happy data matrix, 220 induced voltage attenuates propor-
The mello painting tional to the square root of the cable
background and foreground discus- radius, 58
sion, 208 inverting the T transform, 51
the neutral Würfelspiel matrix limiting value of the T transform of
a painting encoding, 216 the pulse family, 51
an assembled neutral painting, 210 modeling an idealized current pulse,
connections between the musical 46
alphabet and the painting edge reinterpreting the cable model results
encoding, 215 in terms of families of charges, 54
570 Index

the family of current impulses, 46 the solution for the constant applied
the idealized solution to the T trans- current, 56
form model, 51
the solution to the cable equation that
The initial condition assumption, 49
was scaled using a variable trans-
the input is a constant current, 55
formation, 53
the new model in terms of the T
transform, 50 the solution to the original unscaled
the new scaled model to solve, 49 cable equation, 53

You might also like