CST, BioInformation Processing a Primer on Computational Cognitive Science
CST, BioInformation Processing a Primer on Computational Cognitive Science
James K. Peterson
BioInformation
Processing
A Primer on Computational Cognitive
Science
Cognitive Science and Technology
Series editor
David M.W. Powers, Adelaide, Australia
BioInformation Processing
A Primer on Computational
Cognitive Science
123
James K. Peterson
Department of Mathematical Sciences
Clemson University
Clemson, SC
USA
vii
viii Contents
3 Integral Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 The Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.1 Homework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1 Homework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 The Time Dependent Cable Solution . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 The Solution for a Current Impulse. . . . . . . . . . . . . . . . . . . . 46
4.1.1 Modeling the Current Pulses . . . . . . . . . . . . . . . . . . 46
4.1.2 Scaling the Cable Equation . . . . . . . . . . . . . . . . . . . 47
4.1.3 Applying the Laplace Transform in Time . . . . . . . . . 49
4.1.4 Applying the Fourier Transform in Space . . . . . . . . . 50
4.1.5 The T Transform of the Pulse . . . . . . . . . . . . . . . . . 50
4.1.6 The Idealized Impulse T Transform Solution . . . . . . 51
4.1.7 Inverting the T Transform Solution . . . . . . . . . . . . . 51
4.1.8 A Few Computed Results . . . . . . . . . . . . . . . . . . . . 53
4.1.9 Reinterpretation in Terms of Charge . . . . . . . . . . . . . 54
4.2 The Solution to a Constant Current. . . . . . . . . . . . . . . . . . . . 55
4.3 Time Dependent Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
List of Figures
xv
xvi List of Figures
Figure 11.12 Sad sequences using all the openings, the first middle
phrases and the second closing . . . . . . . . . . . . . . . . . . .. 199
Figure 11.13 The angry music matrix . . . . . . . . . . . . . . . . . . . . . . . .. 200
Figure 11.14 Angry sequences generated using the first opening, all
the middle phrases and the first closing . . . . . . . . . . . . .. 200
Figure 11.15 Angry sequences generated using opening two, the fourth
middle phrase and all the closings . . . . . . . . . . . . . . . . .. 201
Figure 11.16 Some angry phrases. a Middle phrase.
b Opening phrase. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 202
Figure 12.1 Two simple paintings. a This painting uses many
separate layers for the kelp. The human figure is in an
intermediate layer between kelp layers and the seadrag-
ons essentially occupy foreground layers. The back-
ground varies from a dark blue at the bottom to very light
blue, almost white, at the top. b This painting uses a
complicated background layer with many trees. The
dragon and the large tree are midground images, while
the butterflies and moths are primarily foreground in
nature. The girl is midground . . . . . . . . . . . . . . . . . . . .. 206
Figure 12.2 The background and foreground of a simple painting.
a Background of a mellow image. Note this image plane
is quite complicated. Clearly, it could be broken up
further into midground and background images. b Fore-
ground of a mellow image which is also very compli-
cated. A simpler painting using only one of the
foreground elements would work nicely also. c A mellow
painting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 208
Figure 12.3 A neutral painting. a Background. b Midground.
c Foreground. d Assembled painting . . . . . . . . . . . . . . .. 210
Figure 12.4 The neutral painting matrix . . . . . . . . . . . . . . . . . . . . . .. 210
Figure 12.5 16 Neutral compositions: background 1, midground 1,
foregrounds 1–4, background 1, midground 2,
foregrounds 1–4, background 1, midground 3,
foregrounds 1–4, background 1, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 211
Figure 12.6 16 Neutral compositions: background 2, midground 1,
foregrounds 1–4, background 2, midground 2,
foregrounds 1–4, background 2, midground 3,
foregrounds 1–4, background 2, midground 4,
foregrounds 1–4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 211
Figure 12.7 16 Neutral compositions: background 3, midground 1,
foregrounds 1–4, background 3, midground 2,
List of Figures xix
xxiii
List of Code Examples
xxv
xxvi List of Code Examples
Listing 18.51 Initializing the Node Functions for the Chain FFN . . . . . 361
Listing 18.52 Graph evaluation code: evaluation2.m . . . . . . . . . . . . . . 362
Listing 18.53 Testing the new evaluation code . . . . . . . . . . . . . . . . . . 363
Listing 18.54 The energy calculation: energy.m . . . . . . . . . . . . . . . . . 364
Listing 18.55 Testing the Energy Code . . . . . . . . . . . . . . . . . . . . . . . 365
Listing 18.56 Returning forward edge information: Changed
BFsets code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Listing 18.57 Find the location of the absolute maximum of a vector . . . 366
Listing 18.58 Gradient Update Code . . . . . . . . . . . . . . . . . . . . . . . . . 366
Listing 18.59 Chain FFN Training Code . . . . . . . . . . . . . . . . . . . . . . 369
Listing 18.60 Sample Training Session . . . . . . . . . . . . . . . . . . . . . . . 369
Listing 18.61 Node function Initialization. . . . . . . . . . . . . . . . . . . . . . 371
Listing 18.62 Adding upper and lower bounds to the node function
initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Listing 18.63 The new evaluation function: evaluation3.m . . . . . . . . . . 373
Listing 18.64 The updated energy code: energy2.m . . . . . . . . . . . . . . . 374
Listing 18.65 The new update code: GradientUpdate2.m . . . . . . . . . . . 375
Listing 18.66 The altered training loop code: chainffntrain2.m . . . . . . . 377
Listing 18.67 Setup the training session . . . . . . . . . . . . . . . . . . . . . . . 377
Listing 18.68 Do the training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Listing 18.69 A Test File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Listing 18.70 Initializing the MFFN node functions. . . . . . . . . . . . . . . 379
Listing 18.71 Stetting up the CFFN versus MFFN test. . . . . . . . . . . . . 380
Listing 18.72 The initial energy for the CFFN and MFFN code . . . . . . 381
Listing 18.73 One gradient step for both CFFN and MFFN . . . . . . . . . 381
Listing 18.74 The incidence matrix calculation for a CFFN
with feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Listing 18.75 Converting the incidence matrix for a CFFN
with feedback into a dot file . . . . . . . . . . . . . . . . . . . . . 386
Listing 18.76 Testing the CFFN feedback incidence matrix . . . . . . . . . 386
Listing 18.77 Constructing the OCOS/FFP with column
to column feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Listing 18.78 Finding the feedback edges. . . . . . . . . . . . . . . . . . . . . . 387
Listing 18.79 The edges object subtract function . . . . . . . . . . . . . . . . . 388
Listing 18.80 The graphs object subtract function . . . . . . . . . . . . . . . . 388
Listing 18.81 Subtract the feedback edge and construct
the double copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Listing 18.82 Removing a list of edges from an edges object . . . . . . . . 391
Listing 18.83 Removing a list of edges from a graph. . . . . . . . . . . . . . 391
Listing 18.84 Extracting the feedback edge indices from the incidence
matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Listing 18.85 Extracting the new feedforward edges constructed
from the old feedback ones. . . . . . . . . . . . . . . . . . . . . . 392
Listing 18.86 New Lagged Gradient Update Code. . . . . . . . . . . . . . . . 393
xxviii List of Code Examples
This book tries to show how mathematics, computer science, and science can be
usefully and pleasurably intertwined. Here we begin to build a general model of
cognitive processes in a network of computational nodes such as neurons using a
variety of tools from mathematics, computational science, and neurobiology. We
begin with a derivation of the general solution of a diffusion model from a low-level
random walk point of view. We then show how we can use this idea in solving the
cable equation in a different way. This will enable us to better understand neural
computation approximations. Then we discuss neural systems in general and
introduce a fair amount of neuroscience. We can then develop many approxima-
tions to the first and second messenger systems that occur in excitable neuron
modeling. Next, we introduce specialized data for emotional content which will
enable us to build a first attempt at a normal brain model. Then, we introduce tools
that enable us to build graphical models of neural systems in MATLAB. We finish
with a simple model of cognitive dysfunction. We also stress our underlying motto:
always take the modeling results and go back to the scientists to make sure they
retain relevance. We agree with the original financial modeler’s manifesto due to
Emanuel Derman and Paul Wilmott from January 7, 2009 available on the Social
Science Research Network which takes the shape of the following Hippocratic oath.
We have changed one small thing in the list of oaths. In the second item, we have
replaced the original word value by the more generic term variables of interest
which is a better fit for our modeling interests.
• I will remember that I did not make the world, and it does not satisfy my
equations.
• Though I will use models boldly to estimate variables of interest, I will not be
overly impressed by mathematics.
• I will never sacrifice reality for elegance without explaining why I have done so.
• Nor will I give the people who use my model false comfort about its accuracy.
Instead, I will make explicit its assumptions and oversights.
• I understand that my work may have enormous effects on society and the
economy, many of them beyond my comprehension.
xxxiii
xxxiv Abstract
There is much food for thought in the above lines and all of us who strive to
develop models should remember them. In my own work, many times managers
and others who oversee my work have wanted the false comfort the oaths warn
against. We should always be mindful not to give in to these demands.
History
Based On:
Research Notes: 1992–1998
Class Notes: MTHSC 982 Spring 1997
Research Notes: 1998–2000
Class Notes: MTHSC 860 Summer Session I 2000
Class Notes: MTHSC 450, MTHSC 827 Fall 2001
Research Notes: Fall 2007 and Spring 2008
Class Notes: MTHSC 450 Fall 2008
Class Notes: MTHSC 450 Fall 2009
Class Notes: MTHSC 827 Summer Session I 2009
Class Notes: MTHSC 399 Creative Inquiry Spring 2010 and Fall 2010
Research Notes: Summer 2010 and Fall 2010
Class Notes: MTHSC 450 Spring 2010 and Fall 2010
Class Notes: MTHSC 827 Summer Session I 2010
Class Notes: MTHSC 434 Summer Session II 2010
Class Notes: MTHSC 399 Creative Inquiry Spring 2011 and Fall 2011
Class Notes: MTHSC 450 Spring 2011 and Fall 2011
Research Notes: Spring 2011 and Fall 2011
Class Notes: MTHSC 827 Summer Session I 2011
Class Notes: MTHSC 434 Summer Session II 2011
Class Notes: MTHSC 450 Spring 2012
Class Notes: BIOSC 491 Fall 2011 and Spring 2012 Research Notes: Spring 2012
Class Notes: MTHSC 827 Summer Session I 2013
Research Notes: Spring and Fall 2013
Research Notes: Spring and Fall 2014
Research Notes: Spring 2015
Class Notes: MTHSC 827 Summer Session I 2015
xxxv
Part I
Introductory Matter
Chapter 1
BioInformation Processing
In this book, we will first discuss some of the principles behind models of biological
information processing and then begin the process of using those ideas to build models
of cognition and cognitive dysfunction. Most of the mathematical tools needed for
this journey are covered in the first, second book and third book on calculus tools for
cognitive scientists, Peterson (2015a, b, c, d) and so will be either assumed or covered
very lightly here. Other mathematical ideas will be new and their coverage will be
more complete as we try to bring everyone up a notch in mathematical sophistication.
We will discuss the principles of first and second messenger systems in terms of
abstract triggering events and develop models of networks of computational nodes
such as neurons that use various approximations to the action potential generation
process. As usual, we therefore use tools at the interface between science, mathe-
matics and computer science.
Our ultimate goal is to develop infrastructure components allowing us to build
models of cognitive function that will be useful in several distinct arenas such as the
modeling of depression and other forms of mental dysfunction that require a useful
model of overall brain function. From such a model, which at the minimum involves
modules for cortex, midbrain and thalamus, motor responses and sensory subsys-
tems, we could obtain insight into how dopamine, serotonin and norepinephrine
neurotransmitters interact. It is sobering to realize that as the time of this writing,
June 2014, many drugs used for the treatment of cognitive dysfunction are prescribed
without understanding of how they work. Hence, we want to develop models of neu-
rotransmitter interaction that have predictive capability for mental disease. We are
also unapologetically interested in small brain models. How much function can we
obtain in a small package of neuronal elements? We want our models to be amenable
to lesion analysis and so we want easily accessible scripting that will let us build and
then deconstruct models as we see fit. These ideas are also needed in the development
of autonomous robotics with minimal hardware.
It is, of course, very difficult to achieve these goals and we are indeed far from
making it happen. Further, to make progress, our investigations must use ideas from
many fields. Above all, we must find the proper level of abstraction from the messiness
of biological detail that will give us insight. Hence, in this first volume on cognitive
modeling, we focus on the challenging task of how to go about creating a model of
information processing in the brain sufficiently simple it can be implemented and
yet sufficiently complicated to help us investigate cognition in general. We will even
build a simple model of cognitive dysfunction using multiple neurotransmitters. In
other volumes, we will use these abstractions of biological reality to build more
interesting cognitive models.
In the field of Biological Information Processing, the usual tools used to build models
of follow from techniques that are loosely based on rather simplistic models from
animal neurophysiology called artificial neural architectures. We believe we can do
much better than this if we can find the right abstraction of the wealth of biolog-
ical detail that is available; the right design to guide us in the development of our
modeling environment. Indeed, there are three distinct and equally important areas
that must be jointly investigated, understood and synthesized for us to make major
progress. We refer to this as the Software, Wetware and Hardware SWH triangle,
Fig. 1.1. We use double-edged arrows to indicate that ideas from these disciplines
can both enhance and modify ideas from the others. The labels on the edges indicate
possible intellectual pathways we can travel upon in our quest for unification and
synthesis. Almost twenty years ago, an example of the HW leg of the triangle were
the algorithms mapped into Verilog Hardware Description Language (VHDL) used
to build hardware versions of a variety of low-level biological computational units.
In addition, there were the new concepts of Evolvable Hardware where new hard-
ware primitives referred to as Field Programmable Gate Arrays offered the ability
to program the devices Input/Output response via a bit string which could be chosen
as a consequence to environmental input (see Higuchi et al. 1997; Sanchez 1996
and Sipper 1997). There were several ways to do this: online and offline. In online
strategies, groups of FPGAs are allowed to interact and “evolve” toward an appro-
priate bit string input to solve a given problem and in offline strategies, the evolution
is handled via software techniques similar to those used in genetic programming
and the solution is then implemented in a chosen FPGA. In a related approach, it
was even possible to perform software evolution within a pool of carefully chosen
hardware primitives and generate output directly to a standard hardware language
such as VHDL so that the evolved hardware can then be fabricated once an appro-
priate fitness level is reached. Thus, even 20 years ago, various areas of research
used new relatively plastic hardware elements (their I/O capabilities are determined
at run-time) or carefully reverse-engineered analog VLSI chipsets to provide us with
a means to take abstractions of neurobiological information and implement them on
silicon substrates.
Much more is possible now with the advent of synthetic biology and the con-
struction of almost entirely artificial cells. in effect, there was and still is, a blurring
between the traditional responsibilities of hardware and software for the kinds of
typically event driven modeling tasks we see here. The term Wetware in the figure is
then used to denote things that are of biological and/or neurobiological scope. And,
of course, we attempt to build software to tie these threads together.
These issues from the hardware side are highly relevant to any search for useful
ideas and tools for building biological models. Algorithms that can run in hardware
need to be very efficient and to really build useful models of complicated biology will
require large amounts of memory and computational resources unless we are very
careful. So there are lessons we can learn by looking at how these sorts of problems
are being solved in these satellite disciplines. Even though our example discussion
here is from the arena of information processing in neurobiology, our point is that
Abstraction is a principle tool that we use to move back and forth in the fertile
grounds of software, biology and hardware.
In general, we will try try hard to illuminate your journey as a reader through this
material. Let’s begin with a little philosophy; there are some general principles to
pay attention to.
As we have mentioned, there are many threads to pull together in our research. We
should not be dismayed by this as interesting things are complex enough to warrant
it. You can take heart from what the historian Barbara Tuchman said in speaking
about her learning the art of being a historian. She was working one of her first jobs
and her editor was unhappy with her: as she says (Tuchman 1982, P. 17) in her essay
In Search of History
The desk editor, a newspaperman by training, grew very impatient with my work. ‘Don’t
look up so much material,’ he said. ‘You can turn out the job much faster if you don’t know
too much.’ While this was no doubt true for a journalist working against a deadline, it was
not advice that suited my temperament.
6 1 BioInformation Processing
There is a specific lesson for us in this anecdote. We also are in the almost unenviable
position of realizing that we can never know too much. The problem is that in addition
to mixing disciplines, languages and points of view, we must also find the right level
of abstraction for our blended point of view to be useful for our synthesis. However,
this process of finding the right level of abstraction requires reflection and much
reading and thinking.
In fact, in any endeavor in which we are trying to create a road map through a
body of material, to create a general theory that explains what we have found, we,
in this process of creating the right level of abstraction, like the historian Tuchman,
have a number of duties (Tuchman 1982, p. 17):
The first is to distill. [We] must do the preliminary work for the reader, assemble the
information, make sense of it, select the essential, discard the irrelevant—above all discard
the irrelevant—and put the rest together so that it forms a developing ... narrative. ...To
discard the unnecessary requires courage and also extra work. ...[We] are constantly being
beguiled down fascinating byways and sidetracks. But the art of writing—the test of [it]—is
to resist the beguilement and cleave to the subject.
Although our intellectual journey will be through the dense technical material of
computer science, mathematics, biology and even neuroscience, Tuchman’s advice
is very relevant. It is our job to find this proper level of abstraction and this interesting
road map that will illuminate this incredible tangle of facts that are at our disposal. We
have found that this need to explain is common in most human endeavors. In music,
textbooks on the art of composing teach the beginner to think of writing music in
terms of musical primitives like the nouns, verbs, sentences, phrases, paragraphs of a
given language where in the musical context each phoneme is a short collection of five
to seven notes. Another example from history is the very abstract way that Toynbee
attempted to explain the rise and fall of civilizations by defining the individual entities
city, state and civilization (Toynbee and Caplan 1995).
From these general ideas about appropriate abstraction, we can develop an under-
standing of which abstractions might be of good use in an attempt to develop a
working model of autonomous or cognitive behavior. We will need sophisticated
models of connections between computational elements (synaptic links between
classical lumped sum neuronal models are one such type of connection) and new
ideas for the asynchronous learning between objects. Our challenges are great as
biological systems are much more complicated than those we see in engineering or
mathematics.
These are sobering thoughts aren’t they? Still, we are confident we can shed some
insight on how to handle these modeling problems. In this volume, we will discuss
relevant abstractions that can open the doorway for you to get involved in useful and
interesting biological modeling.
Unfortunately, as we have tried to show, all of the aforementioned items are very
nicely (or very horribly!) intertwined (it depends on your point of view!). We hope
that you the reader will persevere. As you have noticed, our style in this report is to
use the “royal” we rather than any sort of third person narrative. We have always felt
1.2 The Threads of Our Tapestry 7
that this should be interpreted as the author asking the reader to explore the material
with the author. To this end, the text is liberally sprinkled with you to encourage this
point of view. Please take an active role!
This text covers the material in bioinformation processing in the following way.
Part I: Introductory Matter This is the material you are reading now. Here we
take the time to talk about modeling in general and use examples taken from the
first three courses on calculus for Cognitive Scientists (Peterson 2015a, b, c, d).
Part II: Diffusion Models In this portion of the text, we show you how to solve
the time dependent cable equation in a different way. This gives us additional
insight into what these solutions look like which helps us when we are trying to
find effective ways to approximate these computations.
Part III: Neural Systems In this part, we begin the real process of abstracting
biological information processing.
• In Chap. 17, we recast the matrix based networks of simple neuron processing
functions as a chain, or graph, of computational nodes. At this point, it is clear
the the neural processing can be much more interesting. We also introduce
MatLab coding of these graphs.
Part VI: Graph Based Modeling In Matlab We now show you how to build rea-
sonable graph models of neural computation using MatLab. This is not a great
choice of programming language but it is readily available and easy to learn com-
pared to C, C++ and Python.
• In Chap. 18, we build object oriented class models in MatLab (be warned, this
is clumsy and it makes us hunger for a better computing language—but we
digress....).
• In Chap. 19, we move to graph based models that are based on vector addressing:
each neuron has a global address and a vector address which we can use to find
it in a model.
• In Chap. 20, we actually build some simple brain models.
Part VII: Models of Cognitive Dysfunction In Chap. 21, we finish this text with
some preliminary conversations on how to build a model of normal cognitive
function so that we can properly discuss cognitive dysfunction. We introduce
the details of subgraph training based on the Laplacian based training briefly
introduced in Chap. 15 and show how we could use those ideas and the music and
painting data from Chaps. 11 and 12 to first build a model of a normal brain (yes,
we know that is hard to define!) and from that a model of dysfunction such as
depression. We only sketch the ideas as the implementation will take quite a long
time and deserves a long treatment. This will be done in the next volume.
Part VIII: Conclusions In Chap. 22 we talk a bit about where this volume has led
us and what we can do with the material we have learned.
Part IX: Further Reading In Chap. 23 we mention other papers and books and
software tools we, and you, might find useful.
Finally, this course is just a first step in the development of the tools needed to
build the graphs of computational nodes which subserve information processing in
the brain. The goal is to build models of cognitive dysfunction using reasonable
models of neurotransmitter action and we pursue this further in the next volume. Our
style in these lectures is to assume you the reader are willing to go on this journey
with us. So we have worked hard at explaining in detail all of our steps. We don’t
assume you have background in biophysics and the biology of neurons and so we
include chapters on this material. We develop the mathematical tools you need to a
large extent in house, so to speak. We also try to present the algorithm design and
the code behind our software experiments in a lot of detail. The interactive nature
of the MatLab/Octave integrated development environment makes it ideal for doing
the exploratory exercises we have given you in these pages. The basic background
we assume is contained in our companion books (Peterson 2015a, b, c, d).
10 1 BioInformation Processing
The details of how proteins interact, how genes interact and how neural modules
interact to generate the high level outputs we find both interesting and useful are
known to some degree but it is quite difficult to use this detailed information to
build models of high level function. Let’s explore several models before we look at
a cognitive model. In all of these models, we are asking high level questions and
wondering how we might create a model that gives us some insight. We all build
models of the world around us whether we use mathematical, psychological, political,
biological and so forth tools to do this. All such models have built in assumptions
and we must train ourselves to think abut these carefully. We must question the
abstractions of the messiness of reality that led to the model and be prepared to
adjust the modeling process if the world as experienced is different from what the
model leads them to expect. There are three primary sources of error when we build
models:
• The error we make when we abstract from reality; we make choices about which
things we are measuring are important. We make further choices about how these
things relate to one another. Perhaps we model this with mathematics, diagrams,
words etc.; whatever we choose to do, there is error we make. This is called Model
error.
• The error we make when we use computational tools to solve our abstract models.
This error arises because we typically must replace the model we came up with in
the first step with an approximate model that can be implemented on a computer.
This is called Truncation error.
• The last error is the one we make because we can not store numbers exactly in any
computer system. Hence, there is always a loss of accuracy because of this. This
is called Round Off error.
All three of these errors are always present and so the question is how do we know
the solutions our models suggest relate to the real world? We must take the modeling
results and go back to original data to make sure the model has relevance. These
models are not too hard but capture the tension between the need to understand a
very high level question using what we know at the first principle level.
The spread of a gene throughout a population: We can ask the question of how
does a given phenotype become dominant in a population? To do this, we try to
formulate the problem in a very abstract way for a very simple population of two
phenotypes with simple assumptions about fertility etc. This simple model, nev-
ertheless, has good explanatory power and we can use it to understand better how
domesticated wheat became dominant and wild wheat became rare. Answering
that high level question is difficult without some sort of abstract model to develop
an equation based model we can use to see how the split between two phenotypes
alters over time. The really interesting thing about this model is that all of it is
developed using only algebra.
The Spread of Altruism in a Population: To develop a model for the spread of
altruistic behavior in a population, we assume this behavior can be explained as
a phenotype due to some combination of genes. We can build on the Viability
model to some extent, but to do this properly, we have to find a way to model
actions made by individuals when there is a cost to them even though there is a
benefit to the good of the population. This is a very high level idea and all the
detail about protein transcription, regulatory gene networks and so forth is really
irrelevant. We need a better way to find insight. Here, we carefully go over the
insight into the spread of altruism called Hamilton’s Rule which was formulated
by Hamilton in the early 1960’s. Hamilton’s Rule was based on a parameter called
the kinship coefficient and it is quite difficult to understand what is might mean.
We give several different ways of looking at it all of which shed insight in their
own way.
A Simple Insulin Model: In diabetes there is too much sugar in the blood and the
urine. This is a metabolic disease and if a person has it, they are not able to use
up all the sugars, starches and various carbohydrates because they don’t have
enough insulin. Diabetes can be diagnosed by a glucose tolerance test (GTT). If
you are given this test, you do an overnight fast and then you are given a large dose
of sugar in a form that appears in the bloodstream. This sugar is called glucose.
Measurements are made over about five hours or so of the concentration of glucose
in the blood. These measurements are then used in the diagnosis of diabetes.
It has always been difficult to interpret these results as a means of diagnosing
whether a person has diabetes or not. Hence, different physicians interpreting the
same data can come up with a different diagnosis, which is a pretty unacceptable
state of affairs! The idea of this model, which we introduce in Peterson (2015b),
for diagnosing diabetes from the GTT is to find a simple dynamical model of
the complicated blood glucose regulatory system in which the values of two
parameters would give a nice criterion for distinguishing normal individuals from
those with mild diabetes or those who are pre diabetic. Of course, we have to
choose what these parameters are and that is the art of the modeling process!
A Cancer Model: A simple model of cancer is based on a tumor suppressor gene
(TSG) that occurs in two alleles. There are two pathways to cancer: one is due to
point mutations that knock out first one allele and then another of the TSG. The
1.4 Theoretical Modeling Issues 13
other pathway starts with cells having both alleles of the TSG but a chromosome
that is damaged in some way (copy errors in cell division etc.). The question is
which pathway to cancer is dominant? This is hard to answer, of course. With
our model, we can generate a prediction that there is an algebraic relationship
between the chromosomal damage rate and the population of cells in the cancer
model. It is quite messy to do this, but imagine how hard it would be to find this
relationship using computer simulations. Knowing the relationship equation is
very helpful and gives us a lot of insight.
The Classical Predator Prey Model: In the classical Predator—Prey model, we use
abstraction to develop the nonlinear system of differential equations that repre-
sents the interactions between food fish and predator fish. All the food fish are
lumped together and all the different types of predators are also thrown into one
bin. Nevertheless, this model predicts periodic solutions whose average food fish
and predator values explain very nicely puzzling data from World War I. How-
ever, the model does not take into account self interaction between food fish and
predator fish. When that is added, we can still explain the data, but the trajectories
for the food and predator fish converge over time to a fixed value. This hardly
represents the behavior we see in the sea, so we have lost some of our insight
and the model is much less explanatory even though it is more accurate because
self interaction has been added. A general principle seems to be peeking out here:
adding self interaction which is essentially adding damping rules out periodicity.
So in a biological model which has periodicity in it, there should be a push–pull
mechanism where the damping can become excitation. Another way to look at it
is that we need positive and negative feedback in the model. This is not a proof,
of course, just a reasonable hypothesis.
A West Nile Virus Infection Model: A West Nile Virus infection does something
very puzzling. Normally, as the amount of virus in an animal host increases, the
animal’s probability of survival drops. So if you took 10 mice and exposed them
to a level of virus, you could measure how many mice survived easily. You just
look at the mice to see if they are still alive. If you plot host survival versus
virus infection level, for most viral infections, you see a nice reversed S-curve.
At low levels, you have 10 surviving and as the viral load increases, that number
smoothly decays to 0 with no upticks in between. West Nile Virus infection is
quite different: there is some sort of immunopathology going on that allows the
number of surviving mice to oscillate up and down. Hence, even at high viral load,
you can have more mice surviving than you expect. The question here is simple:
explain the survival curve data. This is very hard. We build a model but to do
it, we have to come up with abstract approximations of T-Cell interactions, the
spread of various chemical messengers throughout the system and so forth. But
most importantly, what constitutes host death in our simulation? Host death is a
very high level idea and to compare our simulation results to the real survival data
requires that our measure of host death is reasonable. So this model is quite messy
and filled with abstractions and at the end of the day, it is still hard to decide what
a simulated host death should be. This model is a lot like a brain model where our
question is “What is depression?”
14 1 BioInformation Processing
1.5 Code
All of the code we use in this book is available for download from the site Biological
Information Processing (http://www.ces.clemson.edu/~petersj/CognitiveModels.
html). The code samples can then be downloaded as the zipped tar ball Cogni-
tiveCode.tar.gz and unpacked where you wish. If you have access to MatLab, just
add this folder with its sub folders to your MatLab path. If you don’t have such access,
download and install Octave on your laptop. Now Octave is more of a command
line tool, so the process of adding paths is a bit more tedious. When we start up an
Octave session, we use the following trick. We write up our paths in a file we call
MyPath.m. For us, this code looks like this
Listing 1.1: How to add paths to Octave
f u n c t i o n MyPath ( )
%
s 1 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / : ’ ;
s 2 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /GSO : ’ ;
s 3 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /HH: ’ ;
s 4 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / I n t e g r a t i o n : ’ ;
s 5 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / I n t e r p o l a t i o n : ’ ;
s 6 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / L i n e a r A l g e b r a : ’ ;
s 7 = ’ / h ome/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / N e r n s t : ’ ;
s 8 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /ODE : ’ ;
s 9 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / RootsOpt : ’ ;
s 1 0 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / L e t t e r s : ’ ;
s 1 1 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o / Graphs : ’ ;
s 1 2 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /PDE : ’ ;
s 1 3 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /FDPDE : ’ ;
s 1 4 = ’ / home/ p e t e r s j / M a t L a b F i l e s / B i o I n f o /3DCode ’ ;
s = [ s1 , s2 , s3 , s4 , s5 , s6 , s7 , s8 , s9 , s 1 2 ] ;
addpath ( s ) ;
end
The paths we want to add are setup as strings, here called s1 etc., and to use this, we
start up Octave like so. We copy MyPath.m into our working directory and then do
this.
Listing 1.2: Set paths in octave
o c t a v e>> MyPath ( ) ;
We agree it is not as nice as working in MatLab, but it is free! You still have
to think a bit about how to do the paths. For example, in this text, we develop
two different ways to handle graphs in MatLab. The first is in the directory
GraphsGlobal and the second is in the directory Graphs. They are not
to be used together. So if we wanted to use the setup of Graphs and noth-
ing else, we would edit the MyPath.m file to set s = [s11]; only. If we
wanted to use the GraphsGlobal code, we would edit MyPath.m so that
s11 =’/home/petersj/MatLabFiles/BioInfo/GraphsGlobal:’;
and then set s = [s11];. Note the directories in the MyPath.m are ours: the main
directory is ’/home/petersj/MatLabFiles/BioInfo/ and of course,
you will have to edit this file to put your directory information in there instead of ours.
1.5 Code 15
All the code will work fine with Octave. So pull up a chair, grab a cup of coffee
or tea and let’s get started.
References
T. Higuchi, M. Iwata, W. Liu, (eds.), in Evolvable Systems: From Biology to Hardware, Proceedings
of the First International Conference, October 1996. Lecture Notes in Computer Science, Vol.
1259 (Tsukuba, Japan, 1997)
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015a in press)
J. Peterson, Calculus for Cognitive Scientists: Higher Order Models and Their Analysis, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015b in press)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015c in press)
J. Peterson, BioInformation Processing: A Primer On Computational Cognitive Science, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015d in press)
M. Sipper, Evolution of Parallel Cellular Machines: The Cellular Programming Approach, Number
1194 in Lecture Notes in Computer Science (Springer, Berlin, 1997)
E.Sanchez, Towards Evolvable Hardware: The Evolutionary Engineering Approach. in E. Sanchez,
M. Tomassini, (eds.), Proceedings of the First International Workshop, Lausanne, Switzerland,
October 1995, Vol. 1259 (Springer, Berlin, 1996)
A. Toynbee, J. Caplan, A Study of History: A New Edition (Barnes and Noble Books, New York,
1995). Revised and abridged, Based on the Oxford University Press Edition 1972
B. Tuchman, Practicing History: Selected Essays (Ballantine Books, New York, 1982)
Part II
Diffusion Models
Chapter 2
The Diffusion Equation
We are now ready to study the time dependent solutions of the cable equation which
are not separable. We assume you have read the derivation of the cable equation
presented in Peterson (2015) so that you are comfortable with that material. Also,
we have already studied how to solve the cable equation using the separation of
variables technique. This method solved the cable equation with boundary condi-
tions as a product v̂m (λ, τ ) = u(λ) w(τ ). Then, in order to handle applied voltage
pulses
or current pulses, we had to look at an infinite series solution of the form
n A n u n (λ) wn (τ ). Now, we will see how we can find a solution which is not time
and space separable. This requires a fair bit of work: first, a careful discussion of
how the random walk model in a limiting sense gives a probability distribution which
serves as a solution to the classical diffusion model. This is done in this chapter. We
follow the standard presentation of this material in for example Johnston and Wu
(1995) and others.
The solution of the diffusion model is very important because we can use a clever
integral transform technique to turn solve a general diffusion model. So in Chap. 3
we discuss the basics of the Laplace and Fourier Transform. Finally, in Chap. 4, we
use a change of variables to transform a cable model into a diffusion equation. Then,
we use integral transforms to solve the diffusion model. We will get the same solution
we obtained using the random walk model which gives us a lot of insight into the
structure of the solution.
Now these ideas are important because many second messenger systems are based
on the diffusion of a signal across the cytoplasm of the excitable cell. So knowing
how to work with diffusion based triggers is necessary.
The simplest probabilistic model that links the Brownian motion of particles to the
macroscopic laws of diffusion is the 1D random walk model. In this model, we
assume a particle moves every τm seconds along the y axis a distance of either λc
© Springer Science+Business Media Singapore 2016 19
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_2
20 2 The Diffusion Equation
or −λc with probability 21 . Consider the thought experiment shown in Fig. 2.1 where
we see a volume element which has length 2λc and cross sectional area A. Since
we want to do microscopic analysis of the space time evolution of the particles, we
assume that λc < y.
Let φ+ (s, y) denote the flux density of particles crossing from left to right across
the plane located at position y at time s; similarly, let φ− (s, y) be the flux density
of particles crossing from right to left. Further, c(s, y) denote the concentration of
particles at coordinates (s, y). What is the net number of particles that cross the plane
of area A?
Since the particles randomly change their position every τ seconds by ±λc , we
can calculate flux as follows: first, recall that flux here is the number of particles
per unit area and time; i.e., the units are par ticles
sec−cm2
. Since the walk is random, half of
the particles will move to the right and half to the left. Since the distance moved is λc ,
half the concentration at c(y − λ2c , s) will move to the right and half the concentration
at c(y + λ2c , s) will move to the left. Now the number of particles crossing the plane
is concentration times the volume. Hence, the flux terms are
1 Aλc c y − λ2c , s
φ+ (s, y) =
2 Aτm
λc λc
= c y − ,s
2τm 2
λc
1 Aλ c c y + 2
, s
φ− (s, y) =
2 Aτm
λc λc
= c y + ,s
2τm 2
2.1 The Microscopic Space–Time Evolution of a Particle 21
Since, λyc is very small in microscopic analysis, we can approximate the concentration
c using a first order Taylor series expansion in two variables if we assume that the
concentration is sufficiently smooth. Our knowledge of the concentration functions
we see in the laboratory and other physical situations implies that it is very reasonable
to make such a smoothness assumption. Hence, for small perturbations s + a and
y + b from the base point (s, y), we find
∂c ∂c
c(s + a, y + b) = c(s, y) + (s, y) a + (s, y) b + e(s, y, a, b)
∂s ∂y
where e(s, y, a, b) is an error term which is proportional to the size of the largest of
|a| and |b|. Thus, e goes to zero as (a, b) goes to zero in a certain way. In particular,
for a = 0 and b = ± λ2c , we obtain
λc ∂c λc λc
c s, y − = c(s, y) − (s, y) + e s, y, 0, −
2 ∂y 2 2
λc ∂c λc λc
c s, y + = c(s, y) + (s, y) + e s, y, 0,
2 ∂y 2 2
and we note that the error terms are proportional to λ2c . Thus,
λc ∂c λc λc
φ(s, y) = − (s, y) λc + e s, y, 0, − − e s, y, 0,
2τm ∂ y 2 2
λ2c ∂c
≈− (s, y)
2τm ∂ y
as the difference in error terms is still proportional to λ2c at worst and since λc is very
small compared to y, the error term is negligible.
Recall from Peterson (2015) that Ficke’s law of diffusion for particles across a
membrane (think of our plane at y as the membrane) can be written as
∂ [b]
Jdi f f = −D
∂x
22 2 The Diffusion Equation
Equating c with [b] and Jdi f f with φ, we see that the diffusion constant D in Ficke’s
Law of diffusion can be interpreted in this context as
λ2c
D=
2τm
This will give us a powerful connection between the macroscopic diffusion coefficient
of Ficke’s Law with the microscopic quantities that define a random walk as we will
see in the next section.
In this section, we will be following the discussion presented in Weiss (1996), but
paraphrasing and simplifying it for our purposes. More details can be gleaned from a
careful study of that volume. Let’s assume that a particle is executing a random walk
starting at position x = 0 and time t = 0. This means that from that starting point,
the particle can move either +λc or −λc at each tick of a clock with time measured
in time constant units τm . We can draw this as a tree as shown in Fig. 2.2.
In this figure, the labels shown in each node refer to three things: the time, in
units of the time constant τm (hence, t = 3 denotes a time of 3τm ); spatial position
in units of the space constant λc (thus, x = −2 means a position of −2λc units); and
the number of possible paths that read that node (therefore, Paths equal to 6 means
there are six possible paths that can be taken to arrive at that terminal node.
(0, 0)
Since time and space are discretized into units of time and space constants, we
have a physical system where time and space are measured as integers. Thus, we can
ask what is the probability, W (m, n), that the particle will be at position m at time n?
In the time interval of n units, let’s define some auxiliary variables: n + is the number
of time steps where the particle moves to the right—i.e., the movement is +1; and
n − is the number of time steps the particle moves to the left, a movement of −1. We
clearly see that M is really the net displacement and
n = n+ + n−, m = n+ − n−
Solving, we see
n+m n−m
n+ = , n− =
2 2
Let the usual binomial coefficients be denoted by Bn, j where
n n!
Bn, j = =
j j! (n − j)!
Now look closely at Fig. 2.2. If you look at a node, you can count how many right
hand moves are made in any given path to that node. For example, the node for time
4 and space position 2 can be reached by 4 different paths and each of them contains
3 + λc moves. Note three +λc moves corresponds to n + = 3 and the number 4 is the
same as the binomial coefficient Bn=4,n + =3 = 4. Hence, at a given node, all of the
paths that can be taken to that node have the same n + value as shown in Table 2.1.
The triangle in Fig. 2.2 has the characteristic form of Pascal’s triangle: the root node
is followed by two nodes which split into three nodes and so on. The pattern is
typically written in terms of levels. At level zero, there is just the root node. This is
time 0 for us. At level one or time 1, there are two nodes written as 1 − 1. At level
two or time 2, there are three nodes written as 1 − 2 − 1 to succinctly capture the
branching we are seeing. Continuing, we see the node pattern can be written
Each of the nodes at a given time then will have some paths leading to it and on each of
these paths, there will be the same number of right hand moves. So counting right hand
moves, the paths denoted by 1 − 1 correspond to n + = 0 for the left node and n + = 1
for the right node. The number of paths for these are the same as Bn=1,n + =0 = 1 and
24 2 The Diffusion Equation
Let the probability that the particle moves to the right be p and the probability it
moves to the left be q. Then we have p + q = 1. Let’s assume that p ≤ 0.5. This
forces q ≥ 0.5. Then, in n time units, the probability a given path of length n is taken
is ρ(n) where
+ − + +
ρ(n) = p n q n = p n q n−n
and the probability that a path will terminate at position m, W (m, n), is just the
number of paths that reach that position in n time units multiplied by ρ(n). We know
that for a given number of time steps, certain position will never be reached. Note
that if the fraction 0.5(n + m) is not an integer, then there will be no paths that
reach that position m. Let n f be the results of the computation n f = 0.5(n + m). We
know n f need not be an integer. Define the extended binomial coefficients Cn,n f by
Bn,n f if n f is an integer
Cn,n f =
0 else
Then, we see
W (m, n) = Cn,n f p n f q n−n f
From our discussions above, it is clear for any position m that is reached in n time
units, that this can be rewritten in terms of n + as
+ +
(m) (m)
W (m, n) = Bn,n + (m) p n q n−n
2.3 Rightward Movement Has Probability 0.5 or Less 25
n
E(m) = mW (m, n)
m=−n
where of course, many of the individual terms W (m, n) are actually zero for a given
time n because the positions are never reached. From this, we can infer that
n
+ +
E(m) = m Bn,n + p n q n−n
n + =0
To compute this, first, switch to a simple notation. Let n + = j. Then, since n + = m+n
2
,
m = 2 j − n and so
n
E(m) = m Bn, j p j q n− j
j=0
n
= (2 j − n) Bn, j p j q n− j
j=0
= 2 E( j) n S
where
n
E( j) = j Bn, j p j q n− j
j=0
n
S= Bn, j p j q n− j
j=0
n
( p + q) = S =
n
Bn, j p j q n− j = 1
j=0
26 2 The Diffusion Equation
d
p ( p + q)n = p n ( p + q)n−1 = p n
dp
Thus,
n
E( j) = j Bn, j p j q n− j
j=0
n
= j p Bn, j p j−1 q n− j
j=0
n
d
= Bn, j p j q n− j
p
j=0
dp
⎛ ⎞
d ⎝
n
=p Bn, j p j q n− j ⎠
dp j=0
d
=p ( p + q)n
dp
= pn
E(m) = 2 E( j) nS
= 2 pn − n
We compute the standard deviation of our particle’s movement through space and
time in a similar way. First, we find the second moment of m,
n
E(m 2 ) = m 2 W (m, n)
m=−n
2.3 Rightward Movement Has Probability 0.5 or Less 27
Our earlier discussions still apply and we find we can rewrite this as
n
E(m 2 ) = (2 j − n)2 Bn, j p j q n− j
j=0
n
= (4 j 2 − 4 jn + n 2 ) Bn, j p j q n− j
j=0
= 4E( j 2 ) − 4 n E( j) + n 2 S
where E( j 2 ) is the second moment of the binomial distribution. We know S is 1 and
E( j) is pn. So we only have to compute E( j 2 ). Note that
d2
p2 ( p + q)n = p 2 n (n − 1) ( p + q)n−2 = p 2 n (n − 1)
d2 p
Also
d2
p 2 n (n − 1) = p 2 ( p + q)n
d2 p
⎛ ⎞
d 2 n
= p2 2 ⎝ Bn, j p j q n− j ⎠
d p j=0
n
d2
= p2 Bn, j p j q n− j
j=0
d2 p
n
= p 2 j ( j − 1) Bn, j p j−2 q n− j
j=0
n
= ( j 2 − j) Bn, j p j q n− j
j=0
= E( j 2 ) − E( j)
We conclude that
E( j 2 ) = p 2 n (n − 1) + p n
E(m 2 ) = 4E( j 2 ) − 4 n E( j) + n 2
28 2 The Diffusion Equation
Now recall the standard formula from Statistics: the square of the standard deviation
of our distribution is σ 2 = E(m 2 ) − (E(m))2 . Hence,
σ 2 = E(m 2 ) − (E(m))2
= 4E( j 2 ) − 4 n E( j) + n 2 − (2E( j) − n)2
= 4(E( j 2 ) − (E( j))2 )
= 4 p n (n − 1) + pn − ( pn)
2 2
= 4 np(1 − p) = 4 npq.
E(m) = 2 p n − n = 0
√
σ= n
Note, that if the random walk is skewed, with say p = 0.1, then we would obtain
E(m) = 2 p n − n = −0.8n
√
σ = .6 n
so that for large n, the standard deviation of our particle’s movement would be
approximately 0.6n rather than n.
For a very large number of steps, the probability distribution, W (m, n), will approach
a limiting form. This is done by using an approximation to k! that is know as the
Stirling Approximation. It is known that for very large k,
k
√ k
k! ≈ 2πk .
e
2.4 Macroscopic Scale 29
The distribution of our particle’s position throughout space and time can be written as
n+m n−m
W (m, n) = Bn, n+m
2
p 2 q 2
using our definitions of n + and n − (it is understood that W (m, n) is zero for non
integer values of these fractions). We can apply Stirling’s approximation to Bn, n+m
2
:
√ n n
n! ≈ 2πn
e
n+m
n+m n+m n+m 2
! ≈ 2π
2 2 2e
n n+m
2
m n+m
2
≈ π(n + m) 1+
2e n
n−m
n−m n−m n−m 2
! ≈ 2π
2 2 2e
n n−m
2
m n−m
2
≈ π(n − m) 1−
2e n
From this, we find
n
n+m n−m m 2 n n m2 2 m −m
2
m m2
! !≈πn 1− 2 1− 2 1− 1+
2 2 n 2e n n n
Hence, we see
√ − 1 − n
2πn n n n −n m2 2
m2 2
m m2 m − m2
Bn, n+m ≈ ) 1− 2 1− 2 1− 1+
2 πn e 2e n n n n
− 1 − n
2
2 n m2 2
m2 m m2 m − m2
≈ 2 1− 2 1− 2 1− 1+
πn n n n n
Thus,
1 2 1
m2
−n
m2
ln Bn, n+m ≈ ln + n ln(2) + − ln 1 − + ln 1 −
2
2 πn 2 n2 2 n2
m m −m m
+ ln 1 − + ln 1 +
2 n 2 n
30 2 The Diffusion Equation
Now, for small x, the standard Taylor’s series approximation gives ln(1 + x) ≈ x;
hence, for mn sufficiently small, we can say
1 2 1 m2 n m2 m m m m
ln Bn, n+m ≈ ln + n ln(2) + + − −
2
2 πn 2 n 2 2 n 2 2 n 2 n
2 2
1 2 1 m m
≈ ln + n ln(2) + −
2 πn 2 n2 2n
For very large n (i.e. after a very large number of time steps nτm ), since we assume
2
m
m
is very small, the term mn 2 is negligible. Hence dropping that term and exponenti-
ating, we find
2 −m 2
Bn, n+m ≈ exp 2n
2
nπ 2n
Note that if the particle moves with equal probability 0.5 to the right or the left at
any time tick, this reduces to
2 −m2
W (m, n) ≈ e 2n
nπ
and for p = 1
3
and q = 23 , this becomes
n2 m2
2 −m 2 8 1
W (m, n) ≈ exp
nπ 2n 9 2
From our discrete approximations in previous sections, we can now derive the prob-
ability density function, P(x, t) at position x at time t. In what follows, we will
assume that p ≤ 0.5. Let x be a small number which is approximately mλc for
some m. The probability that the particle is in an interval [x − x
2
, x + x
2
] can then
be approximated by
P(x, t) x ≈ W (k, n)
k
2.5 Obtaining the Probability Density Function 31
where the sum is over all indices k such that the position kλc lies in the interval
[x − x
2
, x + x
2
]. Hence,
x x
x− ,x + ≡ {(m − j)λc , ..., mλc , ..., (m + j)λc }
2 2
for some integer j. Now from the way the particle moves in a random walk, only
half of these tick marks will actually be positions the particle can occupy. Hence,
half of the probabilities W (m − i, n) for i from − j to j are zero. The number of
x
nonzero probabilities is thus ≈ 2λ c
. We can therefore approximate the sum by taking
the middle term W (m, n) and multiplying by the number of nonzero probabilities.
x
P(x, t) x ≈ W (m, n)
2λc
which implies, since x = mλc and t = nτm , that for very large n,
1
P(x, t) = W (m, n)
2λc
⎛ ⎞
2λx
1 ⎝ −x 2
⎠
t p c
= exp λ2c
(4 pq) m
2τ
λ2 4 2τm t q
4π 2τcm t
λ2c
Note that the term 2τm
is the diffusion constant D. Thus,
2λx
1 −x 2 t p c
P(x, t) = √ exp (4 pq) 2τm
4π D t 4D t q
Note since p + q = 1, 4 pq is between bigger than zero and strictly less than 1 for
all nonzero p and q with p not 0.5. So for us, if we let ξ be the value 4 1pq , we know
that ln(ξ) > 0. Let A = ln(ξ). Next, let ζ be the value qp . Since p is less than 1/2
here, this means ζ > 1. Let B = ln(ζ). Then, we know A and B are both positive
32 2 The Diffusion Equation
in this case unless the probability of movement left and right is equal. In the case of
equal probability, ξ = 1, ζ = 1 and A = B = 0. However, with unequal probability,
we have
t −t −t ln(ξ)
(4 pq) 2τm = (ξ) 2τm = exp .
2τm
Also,
2λx
p c −x −x ln(ζ)
= (ζ) 2λc = exp
q 2λc
We thus have
1 −x 2 t x
P(x, t) = √ exp −A −B
4π D t 4D t 2τm 2λc
After manipulation, we can complete the square on the quadratic term and rewrite it as
2
−x 2 t x 1 BD B2 − 4 A
−A −B =− x+ t + t
4D t 2τm 2λc 4Dt λc 8τm
and thus
2
1 1 BD B2 − 2 A
P(x, t) = √ exp − x+ t exp t
4π D t 4Dt λc 4τm
The case where p is larger than 1/2 is handled in a symmetric manner. We simply
reverse the role of p and q in the argument above.
The equal probability random walk has A = B = 0 and so the probability density
function reduces to
1 x2
P(x, t) = √ exp −
4π D t 4Dt
2.6 Understanding the Probability Distribution of the Particle 33
It is important to get a strong intuitive feel for the probability distribution of the parti-
cle under the random walk and skewed random walk protocols. A normal distribution
has the form
1 − x
2
P(x) = √ e 2E(m2 )
2πσ
Hence, comparing, we see the standard deviation of our particle can be interpreted as
t
σ = λC2 .
τm
Note, in general, for fixed time, we can plot the particle’s position. We show this in
Fig. 2.3 for three different standard deviations, σ. In the plot, the standard deviation
is labeled as D. Now if we skew the distribution so that the probability of moving to
the right is now 16 , we find
2
1 1 1.61Dt 0.35t
P(x, t) = √ exp − x− exp
4π D t 4Dt λc τm
We will now show that the probability density function P(x, t) given by Eq. 2.1
1 x2
P(x, t) = √ e− 4Dt (2.1)
4π D t
However, often the term D is independent of the variable x allowing us to write the
simpler form
∂u ∂2u
=D 2
∂t ∂x
in the time and space variables (t, x). In this model, we assume x is one dimensional,
although it is easy enough to extend to higher dimensions. In 3 dimensions, u has
units of mM per liter or cm3 , but in a one dimensional setting u has units of mM per
cm. However, the diffusion coefficient D will always have units of length squared
2
per time unit; i.e. cms . Note in our earlier discussions, we found P(x, t) had to have
units of particles per cm or simple cm−1 . We also assume the diffusion coefficient
2.7 The General Diffusion Equation 35
D is positive. We can now show a typical solution for positive time is given by our
P(x, t) and let’s do it by direct calculation. With a bit of manipulation, we find
∂P 1 −3/2 −x 2 /(4Dt) 1 x2
=√ t e − + .
∂t 4π D 2 4Dt
Next,
∂P 1 2x
e−x /(4Dt) −
2
= √
∂x 4π Dt 4Dt
∂ P
2
1 −x 2 /(4Dt) 2
= √ e −
∂x 2 4π Dt 4Dt
2x 2x −x 2 /(4Dt)
+ − − e
4Dt 4Dt
1 2 2x 2
e−x /(4Dt)
2
= √ −1 +
4π Dt 4Dt 4Dt
1 1 1 x2
t −3/2 e−x /(4Dt) − +
2
= √
D 4π D 2 4Dt
1 ∂P
= .
D ∂t
Hence, we see P solves ∂∂tP = D ∂∂xP2 . From our previous discussions, we see we can
2
interpret the motion of our particle as that of a random walk for space constant λc
and time constant τm as the number of time and position steps gets very large.
Note the behavior at t = 0 seems undefined. However, we can motivate the inter-
pretation of the limiting behavior as t → 0 as an impulse injection of current as
follows. Note using the substitution, y = 1t , for x not zero,
√ −x 2 y
lim u(t, x) = lim y exp
t→0 y→∞ 4D
Since for x = 0, lim x→∞ u(t, x) = 0 and limt→0 u(t, x) = 0, we know u(t, x) has a
maximum value at some value of t. The generic shape of the solution can be seen in
Fig. 2.5. The solution at x = 0 behaves like the curve √1t as is shown. The solutions
for non zero values of x behave like spatially spread and damped pulses. In fact, the
36 2 The Diffusion Equation
solution to the diffusion equation for nonzero x provides a nice model of the typical
excitatory pulse we see in a dendritic tree. Also, if I < 0, this is a good model of an
inhibitory pulse as well. Letting
1 x2
I (t) = √ exp −
4π Dt 4Dt
it follows that the current pulses become very sharply defined with large heights as x
approaches 0 and for large x, the pulse has a low height and is spread out significantly.
We show the qualitative nature of these pulses in Fig. 2.6. In both curves shown, the
x2
maximum is achieved at the time point 2D . These general solutions can also be
centered at the point (t0 , x0 ) giving the solution
2.7 The General Diffusion Equation 37
I −(x − x0 )2
u(t, x) = √ exp .
4π D(t − t0 ) 4D(t − t0 )
References
We now discuss the basics of two important tools in the analysis of models:
the Laplace Transform and the Fourier Transform. We will use these tools to solve
the diffusion model which we obtain from the cable model by using a change of
variable technique.
The Laplace Transform acts on the time domain of a problem as follows: given the
function x defined on s ≥ 0, we define the Laplace Transform of x to be
∞
L (x) = x(s) e−βs ds
0
The new function L (x) is defined for some domain of the new variable β.
The variable β’s domain is called the frequency domain and in general, the values
of β where the transform is defined depend on the function x we are transforming.
Also, in order for the Laplace transform of the function x to work, x must not grow to
fast—roughly, x must decay like an exponential function with a negative coefficient.
The solutions we seek to our cable equation are expected on physical grounds to
decay to zero exponentially as we let t (and therefore, also s!) go to infinity and as
we let the space variable z (and hence w) go to ±∞. Hence, the function we seek
will have a well-defined Laplace transform with respect to the s variable.
Now, what about the Laplace transform of a derivative? Consider
∞
dx d x −βs
L = e ds
ds 0 ds
Now if the function x grows slower than some e−c R for some constant c, the limit
will be zero and we obtain
dx
L = β L (x) − x(0)
ds
fˆ = L ( f )
∞
= e−as eβs ds
0
1
= .
β + a
3.1.1 Homework
where j denotes the square root of minus 1 and the exponential term is defined to
mean
e− jξ y = cos(ξ y) − jsin(ξ y)
This integral is well defined if g is what is called square integrable which roughly
means we can get a finite value for the integral of g 2 over the y axis. Note that we
can compute the Fourier Transform of the derivative of g as follows:
∞
dg 1 dg − jξ y
F = √ e dy
dy 2π −∞ dy
Since we assume that the function g decays sufficiently quickly as y → ±∞, the
first term vanishes and we have
∞
1 dg − jξ y
√ e dy = + jξ F (g)
2π −∞ dy
We also assume that the function g’s derivative decays sufficiently quickly as
y → ±∞. Thus the the first term vanishes and we have
∞
1 d 2 g − jξ y dg
√ e dy = + jξ F = j 2 ξ 2 F (g) = −ξ 2 F (g)
2π −∞ dy 2 dy
42 3 Integral Transforms
We can then define the inverse Fourier Transform as follows. If the Fourier Transform
transform of the function g(y) is denoted by ĝ(ξ ), recall the inverse Fourier Transform
of ĝ(ξ ) is given by
∞
1
F −1 (ĝ) = √ ĝ(ξ ) e jξ y dξ.
2π −∞
Now, if the Fourier Transform transform of the function g(y) is denoted by ĝ(ξ ),
recall the inverse Fourier Transform of ĝ(ξ ) is given by
∞
1
F −1 (ĝ) = √ ĝ(ξ ) e jξ y dξ.
2π −∞
Here is a standard example of such an inversion. Consider the inverse Fourier trans-
form of
∞ ∞
r 0 λc I 0 1 r 0 λc I 0
e−ξ s e jξ y dξ = e−ξ s + jξ y dξ.
2 2
√ √
2π 2π −∞ 2π −∞
Now to invert the remaining part, we rewrite the exponent by completing the square:
2 2
ξ jy jy
−ξ s + jξ y = −s ξ − j ξ +
2 2
−
s 2s 2s
2
jy 2 jy
= −s ξ − −
2s 2s
Hence,
jy 2 y2
e−ξ s + jξ y
= e−s(ξ − 2s ) e− 4s
2
because j 2 = −1. To handle the inversion here, we note that for any positive a and
positive base points x0 and y0 :
∞ ∞ ∞ ∞
1
e−a(x−x0 ) e−a(y−y0 ) d xd y = e−u e−v
2 2 2 2
dudv
−∞ −∞ −∞ −∞ a
√ √
using the change of variables u = a(x − x0 ) and v = a(y − y0 ). Now we
convert to polar coordinates and it can be shown the resulting calculation gives
3.2 The Fourier Transform 43
∞ ∞ ∞
−u 2 −v 2 1 1 2π
π
e−r r dr dθ =
2
e e dudv =
−∞ −∞ a a 0 0 a
We are doing what is called a double integration here which we have never discussed.
However, our need for it never arose until this moment. For our purposes, think of
it as applying our standard Riemann integral two times in succession. For example,
if we had the double integral
2 3
(x 2 + y 2 ) d x d y
1 2
we would interpret this as two Riemann integrals working from the inside out.
3
We would first find 2 (x 2 + y 2 ) d x by thinking of the integrand as a function
of x only where the y variable is treated as a constant. This is essentially the inverse
of partial differentiation! We would find
3
3
x3
(x + y ) d x =
2 2
+ y x
2
2 3 2
where we use our usual one variable Cauchy Fundamental Theorem of Calculus.
2
This gives 193
+ y 2 as the answer. Then you apply the outer Riemann integral 1 to
this result to get
2
2
19 19 y 3 19 7
+ y dy =
2
y+ = + .
1 3 3 3 1 3 3
Our situation above is more interesting than our example, of course. We have infinite
∞
limits, −∞ which we interpret as as usual improper integral. Naturally, since Rie-
mann integrals can be interpreted as areas under curves, many time an integration
like this over an unbounded domain ends up with an infinite answer. But the functions
we are integrating in our example strongly decay as they are e−u and e−v . So there
2 2
is hope that even in this infinite case, we will get a finite result. You should take more
mathematics classes! Our volumes do not discuss two and three dimensional Rie-
mann integration although we have discussed differentiation in the setting of more
than one variable. There is always more to learn, but for our purposes, we just need
this one calculation so we didn’t want to belabor this point with several extra chap-
ters. Now getting back to our problem, we see after converting to polar coordinates,
∞ ∞ ∞ 2π
we converted to convert the integral −∞ −∞ dudv into the integral 0 0 r dr dθ .
This used some additional multiple integration tools we are skipping: essentially, to
add up the area of the uv plane using rectangular blocks of area du dv we need an
infinite number of them stretching from −∞ to ∞ in both the u and v directions.
However, to add up the area of the u v plane using area chunks based on the polar
coordinate variables r and θ , we divide up the two dimensional space using rays
from the origin every dθ radians. Then the area pieces formed from slice the rays
44 3 Integral Transforms
between radius r and r + dr . The resulting area is like a chopped off piece of pie and
has area r dr dθ . Then to cover the whole plane we need an infinite number of dr ’s
and dθ ’s from 0 to 2π . Yeah, that’s all there is too it. Fortunately, like we said, this
background is not so important here. Our move to polar coordinates thus gives us
∞ 2 ∞
π π
e−a(x−x0 ) d x e−a(x−x0 ) d x =
2 2
= =⇒
−∞ a −∞ a
3.2.1 Homework
Exercise 3.2.1 Find the inverse Fourier Transform of e−2ξ showing all of the messy
2
Exercise 3.2.2 Find the inverse Fourier Transform of e−1.5ξ showing all of the
2
Exercise 3.2.3 Find the inverse Fourier Transform of e−4ξ showing all of the messy
2
We are now ready to solve the cable equation directly. There are lots of details
here to go through, so it will take awhile. However, as we said earlier, the solutions
we obtain here are essentially the same as the ones we found using random walk
ideas. So we now have a good understanding of what the diffusion constant means.
Note this is very important to understanding diffusion based triggers. Recall the full
cable equation
∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c ke
∂z 2 ∂t
Recall that ke is current per unit length. We are going to show you a mathematical
way to solve the above cable equation when there is an instantaneous current impulse
applied at some nonnegative time t0 and nonnegative spatial location z 0 . Essentially,
we will think of this instantaneous impulse as a Dirac delta function input as we have
discussed before: i.e. we will need to solve
∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c Ie δ(t − t0 , z − z 0 )
∂z 2 ∂t
where δ(t − t0 , z − z 0 ) is a Dirac impulse applied at the ordered pair (t0 , z 0 ). We will
simplify our reasoning by thinking of the impulse applied at (0, 0) as we can simply
translate our solution later as we did before for the idealized impulse solution to the
time independent cable equation.
We assume that the cable is infinitely long and there is a current injection at some
point z 0 on the cable which instantaneously delivers I0 amps of current. As usual,
we will model this instantaneous delivery of current using a family of pulses. For
convenience of exposition, we will consider the point of application of the pulses to
be z 0 = 0.
Consider the two parameter family of pulses defined on the rectangle [− τnm , τnm ] ×
[− λmc , λmc ] by
2 2λ2
I0 nm − n2 t2τ2 −τ
m − 2 2c 2
Pnm (t, z) = e 2
m e m z −λc
τm λ c γ 2
This gives us the proper units for the current impulses. This is then a specific example
of the type of pulses we have used in the past. There are several differences:
• Here we give a specific functional form for our pulses which we did not do before.
It is straightforward to show that these pulses are zero off the (t, z) rectangle and
are infinitely differentiable for all time and space points. Most importantly, this
means the pulses are very smooth at the boundary points t = ± τnm and z = ± λmc .
Note we are indeed allowing these pulses to be active for a small interval of time
before zero. When we solve the actual cable problem, we will only be using the
positive time portion.
• The current delivered by this pulse is obtained by the following integration:
∞ ∞
J = Pnm (t, z)dtdz
−∞ −∞
λc τm
m n
= Pnm (t, z)dtdz
− λmc − τnm
4.1 The Solution for a Current Impulse 47
The constant γ will be chosen so that these integrals give the constant value of
J = 2I0 for all n and m. If we integrate over the positive time half of the pulse,
we will then get the constant I0 instead.
amps
• The units of our pulse are cm−s . The time integration then gives us amp
cm
and the
following spatial integration gives us amperes.
then all full pulses Pnm deliver 2I0 amperes of current when integrated over space
and time and all pulses with only nonnegative time deliver I0 amperes.
∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c ke ,
∂z 2 ∂t
48 4 The Time Dependent Cable Solution
where ke is current per unit length into a diffusion equation with a change of vari-
ables. First, we introduce a dimensionless scaling to make it easier via the change of
variables: y = λzc and s = τtm . With these changes, space will be measured in units of
space constants and time in units of time constants. We then define the a new voltage
variable w by
w(s, y) = vm (τm t, λc z)
∂2w 2 ∂ vm
2
= λ
∂ y2 c
∂z 2
∂w ∂vm
= τm
∂s ∂t
giving us the scaled cable equation
∂2w ∂w
=w+ − ro λ2c ke (τm s, λc y)
∂ y2 ∂s
Now to further simplify our work, let’s make the additional change of variables
(s, y) = w(s, y) es
Then
∂ ∂w
= + w es
∂s ∂s
∂2 ∂2w s
= e
∂ y2 ∂ y2
leading to
∂ 2 −s ∂ −s
e = e − r0 λ2c ke (τm s, λc y)
∂ y2 ∂s
After rearranging, we have the version of the transformed cable equation we need
to solve:
∂2 ∂
= − r0 λ2c τm ke (τm s, λc y) es (4.1)
∂y 2 ∂s
Recall that ke is current per unit length. We are going to show you a physical way to
find a solution to the above diffusion equation when there is an instantaneous current
impulse of strength I0 applied at some nonnegative time t0 and nonnegative spatial
4.1 The Solution for a Current Impulse 49
∂2 ∂
= − ro λ2c I0 δ(s − s0 , y − y0 ) es
∂y 2 ∂s
Then, we find
∂2 ∂
= − r0 λ2c τm Pnm (τm s, λc y) es . (4.2)
∂ y2 ∂s
We will apply some rather sophisticated mathematical tools to solve the Eq. (4.2).
These include the Laplace and Fourier Transforms which were introduced in Chap. 3.
Hence, applying the Laplace transform to both sides of Eq. 4.2, we have
∂2 ∂
L = L − r0 λ2c L (Pnm (τm s, λc y) es )
∂ y2 ∂s
∂ 2 L ()
= β L () − (0, y) − r0 λ2c τm L (Pnm (τm s, λc y) es )
∂ y2
This is just the transform of the time portion of the equation. The space portion
has been left alone. We will now further assume that
(0, y) = 0, y = 0
vm (0, z) = 0, z = 0
∂ 2 L ()
= β L () − r0 λ2c τm L (Pnm (τm s, λc y) es )
∂ y2
50 4 The Time Dependent Cable Solution
We now apply the Fourier Transform in space to the equation we obtained after
applying the Laplace Transform in time. We have
∂ 2 L ()
F = β F (L ()) − r0 λ2c τm F (L (Pnm (τm s, λc y) es ))
∂ y2
or
T () = F (L ())
We must now compute the T transform of the pulse term Pnm (τm s, λc y) es . Since
the pulse is zero off the (t, z) rectangle [− τnm , τnm ] × [− λmc , λmc ], we see the (s, y)
integration rectangle reduces to [− n1 , n1 ] × [− m1 , m1 ]. Hence,
1 1
1 m n
T (Pnm (τm s, λc y) es ) = √ Pnm (τm s, λc y) es e−βs e− jξ y dsdy
2π − m1 0
1 1
1 m n I0 nm − 2 22 − 2
= √ e n s −1 e m 2 y 2 −1 es e−βs e− jξ y dsdy
2π − m1 0 τ m λc γ 2
Thus, we find
r 0 λc I 0
∗ = T () = √
2π(β + ξ 2 )
To move back from the transform (β, ξ) space to our original (s, y) space, we apply
the inverse of our T transform. For a function h(β, ξ), this is defined by
∞ ∞
−1 1 βs −1 −1
T (h) = √ h(β, ξ) e e jξ y
dβdξ = F L (h)
2π −∞ 0
52 4 The Time Dependent Cable Solution
To find the solution to our cable equation, we will apply this inverse transform to .
First, we can compute the inner inverse Laplace transform to obtain
−1 ∗ r0 λc I0 −1 1
= r0 λc I0 e−ξ s .
2
L ( ) = √ L
2π β + ξ 2
1 y2
(s, y) = r0 λc I0 √ e− 4s
4πs
λ2c
Note, if we think of the diffusion constant as D0 = τm
, we can rewrite (s, y)
as follows:
2
1 − x
(s, y) = r0 λc I0 e 4(λ2c /τm )t
4π(λ2c /τm )(t/λ2c )
1 − x
2
1 − x
2
= r 0 λc I 0 e 4D0 t = r0 λ2c I0 √ e 4D0 t
4π D0 (t/λ2c ) 4π D0 t
1 − x
2
P0 (x, t) = √ e 4D0 t (4.3)
4π D0 t
which
√ is the usual probability density function for a random walk with space constant
λc / 2 and time constant τm ! Thus,
4.1 The Solution for a Current Impulse 53
(s, y) = (r0 λc )I0 λc P0 (x, t)
1 ((z−z 0 )/λc )2
−
vm (t, z) = r0 λc I0 √ e 4((t−t0 )/τm ) e−(t−t0 )/τm ) (4.4)
4π ((t − t0 )/τm )
We can use the scaled solutions to generate a few surface plots. In Fig. 4.1 we see
a pulse applied at time zero and spatial location 1.0 of magnitude 4. We can use
the linear superposition principle to sum two applied pulses: in Fig. 4.2, we see the
effects of a pulse applied at space position 1.0 of magnitude 4.0 add to a pulse of
strength 10.0 applied at position 7.0. Finally, we can take the results shown in Fig. 4.2
and simply plot the voltage at position 10.0 for the first three seconds. This is shown
in Fig. 4.3.
where Q 0 is the amount of charge deposited in one time constant. The rest of the
analysis is quite similar. Thus, we can also write our solutions as
r 0 λc Q 0 1 −
((z−z 0 )/λc )2
vm (t, z) = √ e 4((t−t0 )/τm ) e−(t−t0 )/τm (4.5)
τm 4π ((t − t0 )/τm )
4.2 The Solution to a Constant Current 55
Now we need to attack a harder problem: we will apply a constant external current
i e which is defined by
Ie , t >0
i e (t) =
0, t ≤0
Recall that we could rewrite this as i e = Ie u(t) for the standard unit impulse function
u. Now in a time interval h, charge Q e = Ie h is delivered to the cable through the
external membrane. Fix the positive time t. Now divide the time interval [0, t] into
K equal parts using h equals Kt . This gives us a set of K + 1 points {ti }
t
ti = i , 0≤i ≤K
K
where we note that t0 is 0 and t K is t. This is called a partition of the interval [0, t]
which we denote by the symbol P K . Here h is the fraction Kt . Now let’s think of
the charge deposited into the outer membrane at ti as being the full amount Ie h
deposited between ti and ti+1 . Then the time dependent solution due to the injection
of this charge at ti is given by
56 4 The Time Dependent Cable Solution
r 0 λc I e h 1 − (z/λc )
2
and since our problem is linear, by the superposition principle, we find the solution
due to charge Ie h injected at each point ti is given by
r 0 λc I e h
K
1 − (z/λc )
2
r 0 λc I e
K
1 − (z/λc )
2
ti+1 − ti = u K −i − u K −i−1 = h
r 0 λc I e
K
1 − (z/λc )
2
We can do this for any choice of partition P K . Since all of the functions involved
here are continuous, we see that as K → ∞, we obtain the Riemann Integral of the
idealized solution for an impulse of size Ie applied at the point u
r 0 λc I e 1 (z/λc )2
vmI (u, z) = √ e− 4(u/τm ) e−(u/τm )
τm 4π (u/τm )
leading to
t t
r 0 λc I e 1 (z/λc )2
vm (t, z) = vmI (u, z) du = √ e− 4(u/τm ) e−(u/τm ) du
0 0 τm 4π (u/τm )
We can rewrite this in terms of the probability density function associated with
this model. Using Eq. 4.3, we have
t t
r0 λ2c Ie
vm (t, z) = vmI (u, z) du = P0 (z, u) e−(u/τm ) du
0 0 τm
which is a much more convenient form. Much more can be said about this way of
looking at the solutions, but that leads us into much more advanced mathematics!
4.3 Time Dependent Solutions 57
We now know the solution to the time dependent infinite cable model for an idealized
applied charge pulse has the form
ro λC Q e 1 ( λzC )2 − τt
vm (z, t) = exp − e M . (4.6)
τM 4π( τtM ) 4( τtM )
If we apply a current Ie for all t greater than zero, it is also possible to apply the
superposition principle for linear partial differential equations and write the solu-
tion in terms of a standard mathematical function, the error function, er f (x) =
x −y 2
√2 e dy. We will not go through these details as we feel you have been abused
π 0
enough at this point. However, the work to verify these comments is not that different
than what we have already done, so if you have followed us so far, you can go read
up on this and be confident you can follow the arguments. Note that this situation is
the equivalent of the steady state solution for the infinite cable with a current impulse
applied at z = 0 which can be shown to be
⎡ ⎛ ⎞ ⎤
ro λC Ie ⎣ t | λzC |
er f ⎝ ⎠ − 1⎦ e| λC |
z
vm (z, t) = +
4 τM 2 τM t
⎡ ⎛ ⎞ ⎤
ro λC Ie ⎣ t | λzC |
er f ⎝ ⎠ + 1⎦ e−| λC |
z
+ − (4.7)
4 τM 2 t τM
The slope of this line can then be interpreted as a velocity, the rate at which the fiber
position for half-life changes; this is a conduction velocity and is given by
λC
v 21 = 2 , (4.9)
τM
58 4 The Time Dependent Cable Solution
having units of (cm/s). Using our standard assumption that ri ro and our equations
for λC and gm in terms of membrane parameters, we find
2G M 1
v 21 = a2, (4.10)
ρi C M
2
indicating that the induced voltage attenuates proportional to the square root of the
fiber radius a. Hence, the double the ratio of the fundamental space to time constant
is an important heuristic measure of the propagation speed of the voltage pulse. Good
to know.
Reference
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd,
Singapore, 2015 in press)
Part III
Neural Systems
Chapter 5
Mammalian Neural Structure
In this chapter, we will discuss the basic principles of the neural organization that
subserves cognitive function. A great way to get a handle on this material is in the
neuroanatomy books (Pinel and Edwards 2008) and (Diamond et al. 1985) which
we have spent many hours working with using colored pencils and many sheets of
paper. We encourage you to do this as our simple introduction below is just to get
you started.
Our basic neural model is based on abstractions from neurobiology. The two halves
or hemispheres of the brain are connected by the corpus callosum which is like a
cap of tissue that sits on top of the brain stem. The structures in this area are very
old in an evolutionary sense. The outer surface of the brain is the cortex which is a
thin layer organized into columns. There is too much cortex to fit comfortably inside
the human skull, so as the human species evolved and the amount of cortical tissue
expanded, the cortex began to develop folds. Imagine a deep canyon in the earth’s
surface. The walls of the canyon are called a gyrus and are the cortical tissue. The
canyon itself called a sulcus or fissure. There are many such sulci with corresponding
gyri. Some of the gyri are deep enough to touch the corpus callosum. One such gyrus
is the longitudinal cerebral gyrus fissure which contains the cingulate gyri at the very
bottom of the fissure touching the corpus callosum.
Consider the simplified model of information processing in the brain that is pre-
sented in Fig. 5.3. This has been abstracted out of much more detail (Nolte 2002).
In Brodal (1992) and Diamond et al. (1985), we can trace out the details of the con-
nections between cortical areas and deeper brain structures near the brain stem and
construct a very simplified version which will be suitable for our modeling purposes.
Raw visual input is sent to area 17 of the occipital cortex where is is further processed
by the occipital association areas 18 and 19. Raw auditory input is sent to area 41
of the Parietal cortex. Processing continues in areas 5, 7 (the parietal association
© Springer Science+Business Media Singapore 2016 61
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_5
62 5 Mammalian Neural Structure
areas) and 42. There are also long association nerve fiber bundles starting in the cin-
gulate gyrus which connect the temporal, parietal and occipital cortex together. The
Temporal—Occipital connections are labeled C2, the superior longitudinal fascicu-
lus; and B2, the inferior occipitofrontal nerve bundles, respectively. The C2 pathway
also connects to the cerebellum for motor output. The Frontal—Temporal connec-
tions labeled A, B and C are the cingular, uncinate fasciculus and arcuate fasciculus
bundles and connect the Frontal association areas 20, 21, 22 and 37.
Area 37 is an area where inputs from multiple sensor modalities are fused into
higher level constructs. The top boundary of area 17 in the occipital cortex is marked
by a fold in the surface of the brain called the lunate sulcus. This sulcus occurs
much higher in a primate such as a chimpanzee. Effectively, human like brains have
been reorganized so that the percentage of cortex allotted to vision has been reduced.
Comparative studies show that the human area 17 is 121 % smaller than it should be
if its size was proportionate to other primates. The lost portion of area 17 has been
reallocated to area 7 of the parietal cortex. There are special areas in each cortex
that are devoted to secondary processing of primary sensory information and which
are not connected directly to output pathways. These areas are called associative
cortex and there are primarily defined by function, not a special cell structure. In the
parietal cortex, the association areas are 5 and 7; in the temporal cortex, areas 20,
21, 22 and 37; and in the frontal, areas 6 and 8. Hence, human brains have evolved
to increase the amount of associative cortex available for what can be considered
symbolic processing needs. Finally, the same nerve bundle A connects the Parietal
and Temporal cortex. In addition to these general pathways, specific connections
between the cortex association areas are shown as bidirectional arrows. The box
labeled cingular gyrus is essentially a simplified model of the processing that is
done in the limbic system. Note the bidirectional arrows connecting the cingulate
gyrus to the septal nuclei inside the subcallosal gyrus, the anterior nuclei inside the
thalamus and the amygdala. There is also a two way connection between the anterior
nuclei and the mamillary body inside the hypothalamus. Finally, the cingulate gyrus
connects to the cerebellum for motor output. We show the main cortical areas of
the brain in Fig. 5.1. Our ability to process symbolic information is probably due to
changes in the human brain that have occurred over evolutionary time. A plausible
outline of the reorganizations in the brain that have occurred in human evolution has
been presented in Holloway (1999, p. 96), which we have paraphrased somewhat in
Table 5.1.
In the table presented, we use some possible unfamiliar terms which are the
abbreviation mya, this refers to millions of years ago; the term petalias, which are
asymmetrical projections of the occipital and frontal cortex; and endocast, which is
a cast of the inside of a hominid fossil’s skull. An endocast must be carefully planned
and examined, of course, as there are many opportunities for wrong interpretations.
Holloway (1999, p. 77), believes that there are no new evolutionarily derived struc-
tures in the human brain as compared to other animals—nuclear masses and the fiber
systems interconnecting them are the same. The differences are in the quantitative
relationships between and among these nuclei and fiber tracts, and the organization
of cerebral cortex structurally, functionally and in integration.
5.1 The Basic Model 63
As might be expected, clear evidence of our use of symbolic processing does not
occur until humans began to leave archaeological artifacts which were well enough
preserved. Currently, such evidence first appears in painted shells used for adornment
found in African sites from approximately seventy thousand years ago. Previous to
those discoveries, the strongest evidence for abstract and symbolic abilities came
64 5 Mammalian Neural Structure
in the numerous examples of Paleolithic art from the Europe and Russia in the
form of cave paintings and carved bone about twenty-five thousand years ago (25
kya). Paleolithic art comprises thousands of images made on bone, antler, ivory and
limestone cave walls using a variety of techniques, styles and artistic conventions.
In Conkey (1999, p. 291), it is asserted that
If,...,we are to understand the ‘manifestations and evolution of complex human behavior’,
there is no doubt that the study of paleolithic ‘art’ has much to offer.
In the past, researchers assumed that the Paleolithic art samples that appeared so
suddenly 25 kya were evidence of the emergence of a newly reorganized human brain
that was now capable of symbolic processing of the kind needed to create art. Conkey
argues persuasively that this is not likely. Indeed, the table of brain reorganization
presented earlier, we noted that the increase in associative parietal cortex in area 7
occurred approximately 3 mya. Hence, the capability of symbolic reasoning probably
steadily evolved even though the concrete evidence of cave art and so forth does not
occur until really quite recently. However, our point is that the creation of ‘art’ is
intimately tied up with the symbolic processing capabilities that must underlies any
model of cognition. Further, we assert that the abstractions inherent in mathematics
and optimization are additional examples of symbolic processing.
In addition, we indicate in simplified form, three major neurotransmitter pathways.
In the brain stem, we focus on the groups of neurons called the raphe and the locus
coeruleus as well as dopamine producing cells which are labeled by their location,
the substantia nigra in the brain stem. These cell groups produce neurotransmitters
of specific types and send their outputs to a collection of neural tissues that surround
the thalamus called the basal ganglia. The basal ganglia are not shown in Fig. 5.3
but you should imagine them as another box surrounding the thalamus. The basal
ganglia then sends outputs to portions of the cerebral cortex; the cerebral cortex in
turn sends connections back to the basal ganglia. These connections are not shown
explicitly; instead, for simplicity of presentation, we use the thalamus to cingulate
gyrus to associative connections that are shown. The raphe nuclei produce serotonin,
the locus coeruleus produce norepinephrine and the substantia nigra (and other cells
in the brain stem) produce dopamine. There are many other neurotransmitters, but
the model we are presenting here is a deliberate choice to focus on a few basic
neurotransmitter pathways.
The limbic processing presented in Fig. 5.3 is shown in more detail in Fig. 5.2.
Cross sections of the brain in a plane perpendicular to a line through the eyes and back
of head shown that the cingulate gyrus sits on top of the corpus callosum. Underneath
the corpus callosum is a sheath of connecting nerve tissue known as the fornix which is
instrumental in communication between these layers and the structures that lie below.
The arrows in Fig. 5.2 indicate structural connections only and should not be used
to infer information transfer. A typical model of biological information processing
that can be abstracted from what we know about brain function is thus shown in
the simplified brain model of Fig. 5.3. This shows a chain of neural modules which
subserve cognition. It is primarily meant to illustrate the hierarchical complexity of
5.1 The Basic Model 65
the brain structures we need to become familiar with. There is always cross-talk and
feedback between modules which is not shown.
Some of the principle components of the information processing system of the
brain are given in Table 5.2. The central nervous system (CNS) is divided roughly into
two parts; the brain and the spinal cord. The brain can be subdivided into a number of
discernible modules. For our purposes, we will consider the brain model to consist
of the cerebrum, the cerebellum and the brain stem. Finer subdivisions are then
shown in Table 5.2 where some structures are labeled with a corresponding number
for later reference in other figures. The numbers we use here bear no relation to the
numbering scheme that is used for the cortical subdivisions shown in Figs. 5.1 and
5.3. The numbering scheme there has been set historically. The numbering scheme
shown in Table 5.2 will help us locate brain structures deep in the brain that can
66 5 Mammalian Neural Structure
Fig. 5.3 A simplified path of information processing in the brain. Arrows indicate information
processing pathways
5.1 The Basic Model 67
Fig. 5.4 Brain structures: the numeric labels correspond to structures listed in Table 5.2
only be seen by taking slices. These numbers thus correspond to the brain structures
shown in Fig. 5.4 (modules that can be seen on the surface) and the brain slices of
Figs. 5.5a, b and 5.6a, b. A useful model of the processing necessary to combine
disparate sensory information into higher level concepts is clearly built on models
of cortical processing.
There is much evidence that cortex is initially uniform prior to exposure to envi-
ronmental signal and hence a good model of such generic cortical tissue, isocortex,
is needed. A model of isocortex is motivated by recent models of cortical process-
ing outlined in Raizada and Grossberg (2003). This article uses clues from visual
processing to gain insight into how virgin cortical tissue (isocortex) is wired to allow
for its shaping via environmental input. Clues and theoretical models for auditory
cortex can then be found in the survey paper of Merzenich (2001). For our purposes,
we will use the terms auditory cortex for area 41 of the Parietal cortex and visual
cortex for area 17 of the occipital cortex. These are the areas which receive primary
sensory input with further processing occurring first in cortex where the input is
received and then in the temporal cortex in areas 20, 21, 22 and 37 as shown in
Fig. 5.3. The first layer of auditory cortex is bathed in an environment where sound
68 5 Mammalian Neural Structure
Fig. 5.5 Brain slice 1 details. a Slice 1 orientation. b Neural slice 1 cartoon
Fig. 5.6 Brain slice 2 details. a Slice 2 orientation. b Neural slice 2 cartoon
is chunked or batched into pieces of 200 ms length which is the approximate size of
the phonemes of a person’s native language. Hence, the first layer of cortex develops
circuitry specialized to this time constant.
The second layer of cortex then naturally develops a chunk size focus that is
substantially larger, perhaps on the order of 1000–10,000 ms. Merzenich details how
errors in the imprinting of these cortical layers can lead to cognitive impairments such
as dyslexia. As processing is further removed from the auditory cortex via mylenated
pathways, additional meta level concepts (tied to even longer time constants) are
developed. One way to do this is through what is called Hebbian learning. This
training method strengthens the strength of a connection between a pre and post
neuron on the basis of coincident activity. It has many variations, of course, but that
is the gist of it and it is based on the neurobiology of long term learning.
While it is clear that a form of such Hebbian learning is used to set up these circuits,
the pitfalls of such learning are discussed clearly in the literature (McClelland 2001).
For example, the inability of adult speakers of Japanese to distinguish the sound of
5.1 The Basic Model 69
The connections from the limbic core to the cortex are not visible from the outside.
The outer layer of the cortex is deeply folded and wraps around an inner core that
consists of the limbic lobe and corpus callosum. Neural pathways always connect
these structures. In Fig. 5.5a, we can see the inner structures known as the amygdala
and putamen in a brain cross-section. Figure 5.5a show the orientation of the brain
cross-section while Fig. 5.5b displays a cartoon of the slice itself indicating various
structures.
The thalamus is located in a portion of the brain which can be seen using the
cross-section indicated by Fig. 5.6a. The structures the slice contains are shown in
Fig. 5.6b.
Cortical columns interact with other parts of the brain in sophisticated ways. The
patterns of cortical activity (modeled by the Folded Feedback Pathways (FFP) and
On-Center Off-surround (OCOS) circuits as discussed in Sect. 5.4.2) are modulated
by neurotransmitter inputs that originate in the reticular formation or RF of the
brain. The outer cortex is wrapped around an inner core which contains, among
other structures, the midbrain. The midbrain is one of the most important information
processing modules. Consider Fig. 5.8 which shows the midbrain in cross section. A
number of integrative functions are organized at the brainstem level. These include
complex motor patterns, respiratory and cardiovascular activity and some aspects
of consciousness. The brain stem has historically been subdivided into functional
groups starting at the spinal cord and moving upward toward the middle of the brain.
The groups are, in order, the caudal and rostral medulla, the caudal, mid and
rostral pons and the caudal and rostral midbrain. We will refer to these as the
brainstem layers 1–7. In this context, the terms caudal and rostral refer to whether
a slice is closest to the spinal cord or not. Hence, the pons is divided into the rostral
pons (farthest from the spinal cord) and the caudal pons (closest to the spinal cord).
70 5 Mammalian Neural Structure
The seven brain stem layers can be seen by taking cross-sections through the
brain as indicated by repeated numbers 1–7 shown in Fig. 5.8. Each shows interesting
structure which is not visible other than in cross section. Each slice of the midbrain
also has specific functionality and shows interesting structure which is not visible
other than in cross section. The layers are shown in Fig. 5.7. Slice 1 is the caudal
medulla and is shown in Fig. 5.9a. The Medial Longitudinal Fascilus (MLF) controls
head and eye movement. The Medial Lemniscus is not shown but is right below the
MLF. Slice 2 is the rostral medulla. The Fourth Ventricle is not shown in Fig. 5.9b
but would be right at the top of the figure. The Medial Lemniscus is still not shown
but is located below the MLF like before. In slice 2, a lot of the inferior olivary
nucleus can be seen. The caudal pons is shown in Slice 3, Fig. 5.10a, and the mid
pons in Slice 4, Fig. 5.10b.
The rostral pons and the caudal and rostral midbrain are shown in Figs. 5.11a, b
and 5.12, respectively. Slice 4 and 5 contain pontine nuclei and Slice 7 contains the
5.3 The Brain Stem 71
Fig. 5.9 The medulla cross-sections. a The caudal medulla: slice 1. b The rostral medulla: slice 2
Fig. 5.10 The caudal and mid pons slices. a The caudal pons: slice 3. b The mid pons: slice 4
Fig. 5.11 The rostral pons and caudal midbrain cross-sections. a The rostral pons: slice 5. b The
caudal midbrain: slice 6
72 5 Mammalian Neural Structure
and rostral pons RF neurons thus collect sensory modalities and project this infor-
mation to intralaminar nuclei of thalamus. The intralaminar nuclei project to wide-
spread areas of cortex causes heightened arousal in response to sensory stimuli;
e.g. attention. Our cortical column models must therefore be able to accept modu-
latory inputs from the RF formation. It is also know that some RF neurons release
monoamines which are essential for maintenance of consciousness. For example,
bilateral damage to the midbrain RF and the fibers through it causes prolonged coma.
Hence, even a normal intact cerebrum can not maintain consciousness. It is clear that
input from brainstem RF is needed. The monoamine releasing RF neurons release
norepinephrine, dopamine and serotonin. The noradrenergic neurons are located in
the pons and medulla (locus coeruleus). This is Slice 5 of midbrain and it connects to
cerebral cortex. This system is inactive in sleep and most active in startling situations,
or those calling for watchful situations. Dopaminergic neurons in slice 7 of midbrain
are are located in the Substantia Nigra (SG). Figure 5.14 shows the brain areas that
are influenced by dopamine. Figure 5.15a also shows the SC (superior collicus), PAG
(periaqueductal gray) and RN (red nucleus) structures.
These neurons project in overlapping fiber tracts to other parts of the brain. The
nigrostriatal sends information to the substantia nigra and then to the caudate, puta-
men and midbrain. The medial forebrain bundle projects from the substantia nigra
to the frontal and limbic lobes. The indicated projections to the motor cortex are
consistent with initiation of movement. We know that there is disruption to cortex
function due to dopamine neurons ablation in Parkinson’s disease. The projections
to other frontal cortical areas and limbic structures imply there is a motivation and
cognition role. Hence, imbalances in these pathways will play a role in mental dys-
function. Furthermore, certain drugs cause dopamine release in limbic structures
which implies a pleasure connection. Serotonergic neurons occur in in most levels
of brain stem, but concentrate in raphe nuclei in slice 5 of midbrain—also near the
locus coeruleus (Fig. 5.15b). Their axons innervate many areas of the brain as shown
Fig. 5.15 The dopamine and serotonin releasing neurons sites. a Location of dopamine sites.
b Location of serotonin sites
in Fig. 5.16. It is known that serotonin levels determine the set point of arousal and
influence the pain control system.
The cortex consists of the frontal, occipital, parietal, temporal and limbic lobes
and it is folded as is shown in Fig. 5.17. It consists of a number of regions which
have historically been classified as illustrated in Fig. 5.18a. The cortical layer is thus
subdivided into several functional areas known as the Frontal, Occipital, Parietal,
Temporal and Limbic lobes. Most of what we know about the function of these lobes
comes from the analysis of the diminished abilities of people who have unfortunately
5.4 Cortical Structure 75
Fig. 5.18 Cortical lobes. a Cortical lobes. b The limbic lobe inside the cortex
had strokes or injuries. By studying these brave people very carefully we can discern
what functions they have lost and correlate these losses with the areas of their brain
that has been damaged. This correlation is generally obtained using a variety of brain
imaging techniques such as Computer Assisted Tomography (CAT) and Functional
Magnetic Resonance Imagery (fMRI) scans.
The Frontal lobe has a number of functionally distinct areas. It contains the
primary motor cortex which is involved in initiation of voluntary movement. Also, it
has an specialized area known as Broca’s area which is important in both written and
spoken language ability. Finally, it has the prefrontal cortex which is instrumental in
the maintenance of our personality and is involved in the critical abilities of insight
and foresight. The Occipital lobe is concerned with visual processing and visual
association. In the Parietal lobe, primary somato-sensory information is processed
in the area known as the primary somato-sensory cortex. There is also initial cortical
processing of tactile and proprioceptive input. In addition, there are areas devoted
to language comprehension and complex spatial orientation and perception. The
Temporal lobe contains auditory processing circuitry and develops higher order
auditory associations as well. For example, the temporal lobe contains Wernicke’s
area which is involved with language comprehension. It also handles higher order
visual processing and learning and memory. Finally, the Limbic system lies beneath
the cortex as in shown in Fig. 5.18b and is involved in emotional modulation of
cortical processing.
76 5 Mammalian Neural Structure
Clues and theoretical models for auditory cortex can then be found in the survey
paper of Merzenich (2001). We begin with a general view of a typical cortical col-
umn taken from the standard references of Brodal (1992), Diamond et al. (1985) and
Nolte (2002) as shown in Fig. 5.19. This column is oriented vertically with layer one
closest to the skull and layer six furthest in. We show layer four having a connection
to primary sensory data. The details of some of the connections between the layers
are shown in Fig. 5.20. The six layers of the cortical column consist of specific cell
types and mixtures described in Table 5.3.
We can make some general observations about the cortical architecture. First,
layers three and five contain pyramidal cells which collect information from layers
above themselves and send their processed output for higher level processing. Layer
three outputs to motor areas and layer five, to other parts of the cerebral cortex.
Layer six contains cells whose output is sent to the thalamus or other brain areas.
Layer four is a collection layer which collates input from primary sensory modalities
or from other cortical and brain areas. We see illustrations of the general cortical
column structure in Figs. 5.19 and 5.20. The cortical columns are organized into
larger vertical structures following a simple stacked protocol: sensory data → cortical
column 1 → cortical column 2 → cortical column 3 and so forth. For convenience,
our models will be shown with three stacked columns. The output from the last
column is then sent to other cortex, thalamus and the brain stem.
A useful model of generic cortex, isocortex, is that given in Grossberg (2003), Gross-
berg and Seitz (2003) and Raizada and Grossberg (2003). Two fundamental corti-
cal circuits are introduced in these works: the on-center, off-surround (OCOS) and
the folded feedback pathway (FFP) seen in Fig. 5.21a, b. In Fig. 5.21a, we see the
On-Center, Off-Surround control structure that is part of the cortical column control
circuitry. Outputs from the thalamus (perhaps from the nuclei of the Lateral Genic-
ulate Body) filter upward into the column at the bottom of the picture. At the top of
the figure, the three circles that are not filled in represent neurons in layer four whose
outputs will be sent to other parts of the column.
There are two thalamic output lines: the first is a direct connection to the input layer
four, while the second is an indirect connection to layer six itself. This connection
then connects to a layer of inhibitory neurons which are shown as circles filled in
with black. The middle layer four output neuron is thus innervated by both inhibitory
and excitatory inputs while the left and right layer four output neurons only receive
inhibitory impulses. Hence, the center is excited and the part of the circuit that is off
the center, is inhibited. We could say the surround is off. It is common to call this type
of activation the off surround. Next consider a stacked cortical column consisting of
two columns, column one and column two. There are cortico-cortical feedback axons
originating in layer six of column two which input into layer one of column one. From
layer one, the input connects to the dendrites of layer five pyramidal neurons which
78 5 Mammalian Neural Structure
Fig. 5.21 OCOS and FFP circuits. a The on-center, off-surround control structure. b The folded
feedback pathway control structure
connects to the thalamic neuron in layer six. Hence, the higher level cortical input is
fed back into the previous column layer six and then can excite column one’s fourth
layer via the on-center, off-surround circuit discussed previously. This description is
summarized in Fig. 5.21b. We call this type of feedback a folded feedback pathway.
For convenience, we use the abbreviations OCOS and FFP to indicate the on-center,
off-surround and folded feedback pathway, respectively. The layer six–four OCOS
is connected to the layer two–three circuit as shown in Fig. 5.22a. Note that the layer
four output is forwarded to layer two–three and then sent back to layer six so as to
contribute to the standard OCOS layer six–layer four circuit. Hence, we can describe
this as another FFP circuit. Finally, the output from layer six is forwarded into the
thalamic pathways using a standard OCOS circuit. This provides a way for layer six
neurons to modulate the thalamic outputs which influence the cortex. This is shown
in Fig. 5.22b.
The FFP and OCOS cortical circuits can also be combined into a multi-column
model (Raizada and Grossberg 2003) as seen in Fig. 5.23. It is known that cortical
outputs dynamically assemble into spatially and temporally localized phase locked
structures. A review of such functional connectivity appears in Fingelkurts et al.
(2005). We have abstracted a typical snapshot of the synchronous activity of par-
ticipating neurons from Fingelkurts’ discussions which is shown in Fig. 5.24. Each
burst of this type of synchronous activity in the cortex is measured via skull-cap EEG
equipment and Fingelkurts presents a reasoned discussion why such coordinated
ensembles are high level representations of activity. A cognitive model is thus inher-
ently multi-scale in nature. The cortex uses clusters of synchronous activity as shown
in Fig. 5.24 acting on a sub second to second time frame to successively transform raw
data representations to higher level representations. Also, cortical transformations
from different sensory modalities are combined to fuse representations into new
80 5 Mammalian Neural Structure
Fig. 5.22 Layer six connections. a The layer six–four connections to layer two–three are another
FFP. b The layer six to thalamic OCOS
References
M. Beauchamp, See me, hear me, touch me: multisensory integration in lateral occipital—temporal
cortex. Curr. Opin. Neurobiol. 15, 145–153 (2005)
P. Brodal, The Central Nervous System: Structure and Function (Oxford University Press, New
York, 1992)
M. Conkey, A history of the interpretation of European ‘paleolithic art’: magic, mythogram, and
metaphors for modernity, in Handbook of Human Symbolic Evolution, ed. by A. Lock, C. Peters
(Blackwell Publishers, Massachusetts, 1999)
M. Diamond, A. Scheibel, L. Elson, The Human Brain Coloring Book (Barnes and Noble Books,
New York, 1985)
A. Fingelkurts, A. Fingelkurts, S. Kähkönen, Functional connectivity in the brain—is it an elusive
concept? Neurosci. Biobehav. Rev. 28, 827–836 (2005)
S. Grossberg, How does the cerebral cortex work? development, learning, attention and 3D vision
by laminar circuits of visual cortex, Technical Report TR-2003-005, Boston University, CAS/CS,
2003
S. Grossberg, A. Seitz, Laminar development of receptive fields, maps, and columns in visual cortex:
the coordinating role of the subplate, Technical Report 02-006, Boston University, CAS/CS, 2003
R. Holloway, Evolution of the human brain, in Handbook of Human Symbolic Evolution, ed. by
A. Lock, C. Peters (Blackwell Publishers, Massachusetts, 1999), pp. 74–116
82 5 Mammalian Neural Structure
J. McClelland, Failures to learn and their remediation, in Failures to learn and their remedia-
tion, ed. by J. McClelland, R. Siegler (Lawrence Erlbaum Associates Publishers, USA, 2001),
pp. 97–122
M. Merzenich, Cortical plasticity contributing to child development, in Mechanisms of Cognitive
Development: Behavioral and Neural Perspectives, ed. by J. McClelland, R. Siegler (Lawrence
Erlbaum Associates Publishers, USA, 2001), pp. 67–96
I. Nelken, Processing of complex stimuli and natural scenes in the auditory cortex. Curr. Opin.
Neurobiol 14, 474–480 (2004)
J. Nolte, The Human Brain: An Introduction to Its Functional Anatomy (Mosby, A Division of
Elsevier Science, 2002)
J. Pinel, M. Edwards, A Colorful Introduction to the Anatomy of the Human Brain: A Brain and
Psychology Coloring Book (Pearson, New York, 2008)
R. Raizada, S. Grossberg, Towards a theory of the laminar architecture of cerebral cortex: compu-
tational clues from the visual system. Cereb. Cortex 13, 100–113 (2003)
Chapter 6
Abstracting Principles of Computation
Classical computing via traditional hardware starts with computing primitives such
as AND and OR gates and then assembles large numbers of such gates into circuits
using microfabrication techniques to implement boolean functions. More compli-
cated functions and data other than boolean and/or integer can then be modeled
using this paradigm. The crucial feature here is that simple building blocks are real-
ized in hardware primitives which are then assembled into large structures. Quantum
computing is taking a similar approach; the primitive unit of information is no longer
a boolean bit but instead is a more complicated entity called a qubit. The quantum
nature of the qubit then allows the primitive functional AND/OR gates to have
unusual properties, but the intellectual pathway is still clear. Hardware primitives are
used to assemble logic gates’ logic gate ensembles are used to implement computable
numbers via functions. Is this a viable model of biological computation?
In the thoughts that follow, we will argue that this is the incorrect approach and
biological computation is instead most profitably built from a blurring of the hard-
ware/ software distinction. Following Gerhart and Kirschner (1997), we present a
generic biological primitive based on signal transduction into a second messenger
pathway. We will look at specific cellular mechanisms to give the flavor of this
approach—all based on the presentation in Gerhart and Kirschner (1997) but placed
in very abstract terms for our purposes. We will show how they imply an abstract
computational framework for basic triggering events. The trigger model is essentially
a good approximation to a general second messenger input. We can use these ideas
in future development of a useful approximation to a biological neuron for use in
cognitive models.
A good example of a cellular trigger is the transcriptional control of the factor NFκB
which plays a role in immune system response. This mechanism is discussed in
a semi-abstract way in Gerhart and Kirschner (1997); we will discuss even more
abstractly.
Consider a trigger T0 which activates a cell surface receptor. Inside the cell, there
are always protein kinases that can be activated in a variety of ways. Here we denote
a protein kinase by the symbol PK. A common mechanism for such an activation is
to add to PK another protein subunit U to form the complex PK/U . This chain of
events looks like this:
PK/U then acts to phosphorylate another protein. The cell is filled with large
amounts of a transcription factor we will denote by T1 and an inhibitory protein
for T1 we label as T1∼ . This symbol, T1∼ , denotes the complement or anti version of
T1 . In the cell, T1 and T1∼ are generally joined together in a complex denoted by
T1 /T1∼ . The addition of T1∼ to T1 prevents T1 from being able to access the genome
in the nucleus to transcribe its target protein.
The trigger T0 activates our protein kinase PK to PK/U . The activated PK/U
is used to add a phosphate to T1∼ . This is called phosphorylation. Hence,
where T1∼ P denotes the phosphorylated version of T1∼ . Since T1 is bound into the
complex T1 /T1∼ , we actually have
In the cell, there is always present a collection of proteins which tend to bond with
the phosphorylated form T1∼ P. Such a system is called a tagging system. The protein
used by the tagging system is denoted by V and usually a chain of n such V proteins
is glued together to form a polymer Vn . The tagging system creates the new complex
T1 /T1∼ PVn
Also, inside the cell, the tagging system coexists with a complimentary system whose
function is to destroy or remove the tagged complexes. Hence, the combined system
f SQVn
This symbol means the system acts on SQVn units and outputs S via mechanism f .
Note the details of the mechanism f are largely irrelevant here. Thus, we have the
reaction
which releases T1 into the cytoplasm. The full event chain for a cellular trigger is
thus
where P(T1 ) indicates the protein whose construction is initiated by the trigger T0 .
Without the trigger, we see there are a variety of ways transcription can be stopped:
• T1 does not exist in a free state; instead, it is always bound into the complex T1 /T1∼
and hence can’t be activated until the T1∼ is removed.
• Any of the steps required to remove T1∼ can be blocked effectively killing tran-
scription:
– phosphorylation of T1∼ into T1∼ P is needed so that tagging can occur. So anything
that blocks the phosphorylation step will also block transcription.
– Anything that blocks the tagging of the phosphorylated T1∼ P will thus block
transcription.
– Anything that stops the removal mechanism f SQVn will also block transcription.
The steps above can be used therefore to further regulate the transcription of T1 into
the protein P(T1 ). Let T0 , T0 and T0 be inhibitors of the steps above. These inhibitory
proteins can themselves be regulated via triggers through mechanisms just like the
ones we are discussing. In fact, P(T1 ) could itself serve as an inhibitory trigger—i.e.
as any one of the inhibitors T0 , T0 and T0 . Our theoretical pathway is now:
where the step i, step ii and step iii can be inhibited as shown below:
Note we have expanded to a system of four triggers which effect the outcome of
P(T1 ). Also, note that step i is a phosphorylation step. Now, let’s refine our analysis
a bit more. Usually, reactions are paired: we typically have the competing reactions
Hence, we can imagine that step i is a system which is in dynamic equilibrium. The
amount of T1 /T1∼ P formed and destroyed forms a stable loop with no net T1 /T1∼ P
formed. The trigger T0 introduces additional PK/U into this stable loop and thereby
effects the net production of T1 /T1∼ P. Thus, a new trigger T0 could profoundly effect
phosphorylation of T1∼ and hence production of P(T1 ). We can see from the above
comments that very fine control of P(T1 ) production can be achieved if we think of
each step as a dynamical system in flux equilibrium.
Note our discussion above is a first step towards thinking of this mechanism in
terms of interacting objects.
k1
PK/U + T1 /T1∼ → T1 /T1∼ P
k−1
T1 /T1∼ P → T1 /T1∼ + PK/U
where k1 and k−1 are the forward and backward reaction rate constants and we assume
the amount of T1 /T1∼ inside the cell is constant and maintained at the equilibrium
concentration [T1 /T1∼ ]e . Since one unit of PK/U combines with one unit of T1 /T1∼
to phosphorylate T1 /T1∼ to T1 /T1∼ P, we see
Hence,
d[PK/U ]
= −k1 [PK/U ] [T1 /T1∼ ] + k−1 [T1 /T1∼ P]
dt
d[PK/U ]
= −k1 [PK/U ]2 + k−1 [T1 /T1∼ P] (6.2)
dt
From Eq. 6.1, we see
d[T1 /T1∼ ]
= k1 [PK/U ]2 − k−1 [T1 /T1∼ P] (6.3)
dt
However, kinetics also tells us that
d[T1 /T1∼ ]
= −k1 [PK/U ]2 + k−1 [T1 /T1∼ P] (6.4)
dt
and so, equating these two expressions, we find
which implies
k1
[PK/U ]2 = [T1 /T1∼ P] (6.5)
k−1
d[T1 /T1∼ ]
= k1 [PK/U ]2 − k−1 [T1 /T1∼ P] = 0 (6.6)
dt
Now let [PK/U ]e denote the equilibrium concentration established by Eq. 6.5. Then
if the trigger T0 increases [PK/U ] by δPK/U , from Eq. 6.5, we see
k1 2
[T1 /T1∼ P]new = [PK/U ]e + δPK/U (6.7)
k−1
This back of the envelope calculation can be done not only at step i but at the other
steps as well. Letting δT1 /T1∼ P and δT1 /T1∼ PVn denote changes in critical molecular
concentrations, lets examine the stage ii equilibrium loop. We have
k2
T1 /T1∼ P → T1 /T1∼ PVn (6.8)
k−2
T1 /T1∼ PVn → T1 /T1∼ P (6.9)
d[T1 /T1∼ P]
= −k2 [T1 /T1∼ P] + k−2 [T1 /T1∼ PVn ] (6.10)
dt
d[T1 /T1∼ PVn ]
= k2 [T1 /T1∼ P] − k−2 [T1 /T1∼ PVn ] (6.11)
dt
Dynamic equilibrium then implies that
or
k2
[T1 /T1∼ P] = [T1 /T1∼ PVn ] (6.13)
k−2
Equation 6.13 defines the equilibrium concentrations of [T1 /T1∼ P]e and
[T1 /T1∼ PVn ]e . Now if [T1 /T1∼ P] increased to [T1 /T1∼ P] + δT1 /T1∼ P , the percentage
increase would be
2
δT1 /T1∼ P
100 1 +
[T1 /T1∼ P]e
k1
[T1 /T1∼ P]e = [PK/U ]2e
k−1
and hence,
k1
2[PK/U ]e δPK/U + δPK/U
2
δT1 /T1∼ P k−1
=
[T1 /T1∼ P]e k1
k−1
[PK/U ]2e
2
δPK/U δPK/U
=2 +
[PK/U ]e [PK/U ]e
δx
For convenience, let’s define the relative change in a variable x as rx = x
. Thus, we
can write
δT1 /T1∼ P
rT1 /T1∼ P =
[T1 /T1∼ P]e
δPK/U
rPK/U =
[PK/U ]e
δT1 /T1∼ PVn = [T1 /T1∼ PVn ]new − [T1 /T1∼ PVn ]e
k2 k2
= [T1 /T1∼ P]e + δT1 /T1∼ P − [T1 /T1∼ P]e
k−2 k−2
k2
= δT /T ∼ P
k−2 1 1
and so
rT1 /T1∼ PVn = rT1 /T1∼ P
= 2rPK/U + rPK/U
2
90 6 Abstracting Principles of Computation
From this, we see that trigger events which cause 2rPK/U + rPK/U
2
to exceed one
√ ∼
(rPK/U > 2 − 1), create an explosive increase in [T1 /T1 PVn ]. Finally, in the third
step, we have
k3
T1 /T1∼ PVn → T1 (6.14)
k−3
T1 → T1 /T1∼ PVn (6.15)
This dynamical loop can be analyzed just as we did in step ii. We see
k3
[T1 /T1∼ PVn ]e = [T1 ]e
k−3
and the triggered increase in [PK/U ]e by δPK/U induces the relative change
k3
δ T1 = δT /T ∼ PV
k−3 1 1 n
k3 k2
= δT /T ∼ P
k−3 k−2 1 1
k3 k2 k1
= 2 δPK/U [PK/U ]e + δPK/U
2
k−3 k−2 k−1
k3 k2 k1
= 2 rPK/U + rPK/U
2
[PK/U ]2e
k−3 k−2 k−1
We can therefore clearly see the multiplier effects of trigger T0 on protein production
T1 which, of course, also determines changes in the production of P(T1 ).
The mechanism by which the trigger T0 creates activated kinase PK/U can be
complex; in general, each unit of T0 creates λ units of PK/U where λ is quite large—
perhaps 10,000 or more times the base level of [PK/U ]e . Hence, if rPK/U = β and
K = k−1k1kk−22 kk3−3 , we have
for β >> 1. From this quick analysis, we can clearly see the potentially explosive
effect changes in T0 can have on PK/U !
Our first biological primitive is thus the trigger pathway we have seen before:
6.3 An Implication for Biological Computation 91
We know from our discussions that δT1 is (2β +β 2 )K [PK/U ]2e and we can interpret
the output protein P(T1 ) as any of the molecules Q needed to kill a pathway. If P(T1 )
was a T0 receptor, then we would have that the expression of the target protein P(T1 )
generates δPK/U !
Let us note two every important points now: There is richness to this pathway
and the target P(T1 ) can alter hardware or software easily. If P(T1 ) was a K +
voltage activated gate, then we see an increase of δT1 (assuming 1 − 1 conversion of
T1 to P(T1 )) in the concentration of K + gates. This corresponds to a change in the
characteristics of the axonal pulse. Similarly, P(T1 ) could create Na+ gates thereby
creating change in axonal pulse characteristics. P(T1 ) could also create other proteins
whose impact on the axonal pulse is through indirect means such as the kill T0 etc.
pathways. There is also the positive feedback pathway via δPK/U through the T0
receptor creation. In effect, we have a forking or splitting of possibilities as shown
in Fig. 6.1. Note that all of these pathways are essentially modeled by this primitive.
M + tM → M tM (transport) + rM (receptor)
= M/rtM (receptor complex)
→ M (freed M)
There are many other details, of course, but the diagram above captures the basic
information flow. In skeletal outline
M + tM → M tM → M/rtM → M
Once inside the cell, M can be bound into a storage protein sP or M can be used
in other processes. The concentration of M, [M], usually determines which path the
substance will take. Let’s assume there are two states, low and high for [M] which
are determined by a threshold, th. A simple threshold calculation can be performed
by a sigmoidal unit as shown in Fig. 6.2.
The standard mathematical form of such a sigmoid which provides a transfer
between the low value of 0 and high value of 1 is given by
1 x−o
h(x) = 1 + tanh (6.16)
2 g
where x is our independent variable (here [M]), o is an offset (the center point of the
sigmoid), and g is called a gain parameter as it controls the slope of the transfer from
low signal to the high signal via the value h (o). Note,
1 2 x−o
h (x) = sech (6.17)
2g g
This is the same sigmoidal transformation that will be used when we discuss neural
computation in nervous systems in Chap. 14 and subsequent chapters. If we used a
transition between low and high that was instantaneous, such an abrupt transition
would not be differentiable which would cause some technical problems. Hence, we
use a smooth transfer function as listed above and we interpret a low gain g as giving
us a fuzzy or slow transition versus the sharp transition we obtain from a high gain g.
Let’s assume M binds to a switching complex denoted swM when its concentration
is in the high state. The bound complex swM/M is then inactivated. In the low state,
M does not bind to swM and swM is then able to bind to two sites on the messenger
RNA, mRNA, for the two proteins rM and sPM.
The machinery by which mRNA is transformed into a protein can be thought of as
a factory: hence, we let frM and fspM denote the rM and sPM factories respectively.
The mRNAs for these proteins will have the generic appearance as shown in Fig. 6.3.
If you look at Fig. 6.3 closely, you’ll see that there are special structures attached
to the mRNA amino acid sequence. These look like tennis rackets and are called
stem loops. The mRNA for both rM and sPM have these structures. The unbound
switching protein swM can dock with any stem loop if the conditions are right.
We know from our basic molecular biology, that binding near a 3 end stabilizes
mRNA against degradation and thus effectively increases protein creation. On the
other hand, binding near the 5 end inhibits translation of mRNA into protein and
effectively serves as a stop button. Thus, for rM and sPM, we can imagine their
factories possessing two control buttons—stop and stabilize. The switch binds to
the 3 end on the mRNA for rM and the 5 end for the sPM mRNA. As you can see
in Fig. 6.4, where SM denotes swM, Increased rM implies more M enters the cell;
hence, [M] increases. Also, since decreased sPM implies there are fewer binding sites
available, less M is bound, implying [M] increases also. The switch swM thus effects
two primitive components of the cell; the number of M receptors in the cell wall and
the number of storage sites for M in the cytosol. This mechanism could work for
any substance M which enters a cell through a receptor via a transport protein whose
concentration [M] is controlled by a storage/free cytosol interaction. The generic
mechanism is illustrated in Fig. 6.5. This mechanism is based on how iron is handled
in a cell. In human cells, there is no mechanism for removing excess iron; hence, the
high pathway can only show inactivation of the switch and additional binding into
the binding protein. From our discussion, it follows we effectively control [M] by a
switch which can directly activate the genome!
B1 N → B1 + N Breakdown of B1 N
B1 + N → B1 N Creation of B, N
These reactions are made possible by enzymes. Hence letting EB be the breakdown
enzyme, we have
6.5 Control of a Substance via Creation/Destruction Patterns 95
B1 N + EB → B1 + NR
where NR is the N complex that is temporarily formed. The enzyme which breaks
down NR is labeled EBNR and we have the reaction
NR + EBNR → N + R
B1 + NR + EC → B1 N
N + R + ECNR → NR
Note that the binding of N effectively removes it from the cytosol. Now, the above
four equations tell us that the creation or destroying of the complex B1 N is tantamount
to the binding or creating of N itself.
96 6 Abstracting Principles of Computation
We can control which of these occurs as follows. Let’s assume there is a pro-
tein T1 whose job is to phosphorylate a protein S1 . The protein S1 is bound with
another protein S2 and the complex, denoted S1 /S2 will function as a switch. Let the
phosphorylated protein S1 be denoted by S1p ; then we have the reaction
Now imagine our four enzymatically controlled reactions as being toggled on and
off by an activating substance. We illustrate this situation in Fig. 6.6a where the
action of S1 /S2 and S1p /S2 are as indicated. The S1p /S2 switch increases [N] and the
S1 /S2 switch decreases [N]. In detail, S1p /S2 increases the breakdown of NR, more
effectively decreasing [B1 N]; the S1 /S2 increases [NR], effectively increasing [B1 N].
Hence, [B1 N] and N work opposite one another. The control of the phosphorylation
of S1 to S1p is achieved via a T1 /T1∼ mechanism. Specifically, if rB1 N is the B1 N
receptor, then we have the control architecture shown in Fig. 6.6b. In this figure, the
protein P(T1 ) is actually T1 itself. We see we can combine mechanisms to generate
control of a substances concentration in the cytosol. Note, we are still seeing a
blurring of the hardware/software distinction in this example, just as in the others.
These paired reactions are often called futile cycles.
The trigger systems we are modeling must include a mechanism for deactivation.
Indeed, much of the richness of biological computation comes from negative feed-
back. Consider the generic trigger system of Fig. 6.7. This system allows for protein
creation, but it is equally important to allow for deactivation of the product if its
concentration becomes too high. We can use the generic switch method discussed
in Sect. 6.4 in the case where the trigger T0 exists in both a free and bound state
upon entry into the cell. The trigger T0 is processed by the usual trigger mechanism
into the protein P(T1 ) which is sent into a sigmoid switch h. If the concentration of
P(T1 ) is low, the usual stop/ stabilize control is called which increases T0 . If the
concentration is high, we bind T0 to a buffer complex. This mechanism is applicable
to the triggers that control free Ca++ in the cell such as neurotransmitters.
core. This molecule is then called Adenosine Monophosphate or AMP. In these three
versions, the PO3− chains are bound to the ribose sugar directly and do not have an
attachment to the adenine nucleotide itself. There is a fourth version which binds a
PO3− both to the adenine and the ribose both. This bond looks cyclic in nature and so
this last version is called Cyclic Adenosine Monophosphate or CAMP. The formation
and breakdown of these compounds are controlled by enzymes. In reaction form we
have
adenylate kinase
2 ADP −−−−−−−−→ ATP + AMP
AMP → Adenosine
There are 4 phosphate ions on the 2 ADP molecules; adenylate kinase facilitates
the move one of them from one ADP to the other ADP to make a molecule ATP.
The remaining ADP then becomes an AMP molecule. The enzyme adenylate kinase
maintains the concentration of ATP, ADP and AMP at approximately 1 : 1 : 1 ratios.
Hence,
[AMP][ATP]
keq = ≈1
[ADP]2
and
[ADP]2
[AMP] ≈
[ATP]
6.6 Calcium Ion Signaling 99
4
or .04 mM. There is also another way to break
down ATP to ADP which involves muscle contraction under anaerobic conditions.
This gives [ATP] ≈ [ADP] ≈ 2 mM. In this case, when adenylate cyclase is at
2
equilibrium, we find [AMP] is approximately 22 or 2 mM. Hence, with this second
mechanism, the equilibrium concentration of ATP is half the concentration of ATP
from the first reaction pathway. The equilibrium concentration of AMP however
varies significantly more: it is 0.4 mM in the first and 2 mM in the secon—a fifty fold
increase. Hence, the concentration of AMP is a very sensitive indicator of which
reaction is controlling the energy status of the cell.
Now, inside the cytosol, free Ca++ exists in .1 µM to .2 µM concentrations.
Outside the cytosol, this ion’s concentration is much higher—1.8 mM. Thus, the
concentration gradient between inside and outside is on the order of 104 . This very
steep gradient is maintained by surface membrane proteins, a sodium/Ca++ exchange
protein and a membrane calcium pump. In addition, the endoplasmic reticulum (ER)
stores free calcium in pools within its membrane. The movement of calcium in and
out of these pools is controlled by a pump in the ER which uses the reaction between
adenosine triphosphate (ATP) and adenosine diphospate (ADP) to drive calcium in
or out of the pools. The ER pump is therefore called the ATPase pump.
Rapid increases in cystolic Ca++ are used as regulatory signals. Hormones, neuro-
transmitter and electrical activity control Ca++ movement into the cytosol. Inositol-
1,4,5 triphosphate or IP3 controls movement of the ions in and out of the storage
pools in the ER. There is probably 50–100 µM of Ca++ inside the cell but most of
this is bound into proteins such as calmodulin (CAM). Calmodulin relays calcium
signals to other protein targets. The other proteins serve usually as Ca++ buffers
which therefore limit the amount of Ca++ that is free in the cell. Different neurons
have different levels of Ca++ buffering proteins. This implies that there are large
differences in physiological behavior between these classes of neurons.
Ca++ gets into the cytosol from the extracellular fluid via Ca++ gates or channels.
These ports can be voltage dependent like a classical sodium or potassium voltage
gated channel or they can be ligand gated. This means that some protein must bind
to the port in some fashion to initiate further mechanisms that control the amount of
free Ca++ in the cell. These mechanisms could be as direct as the ligand opens the
port and allows Ca++ to come in or could be indirect as we have already discussed
for generic T0 triggers and switches. The important thing to remember is that the
amount of free calcium ion in the cytosol is controlled by four things:
• The ATP based pump in the ER: one calcium ion goes out for each ATP to ADP
reaction. The rate of this pump is controlled by calmodulin,
• calcium ion in through a voltage gated channel,
100 6 Abstracting Principles of Computation
receptor is multiplied many fold by the fact that the G/GDP complexes exist diffu-
sively in the cell. The extracellular signal binds to multiple receptors and each bound
receptor can interact with a different G/GDP complex. Hence, out of this soup of
different G/GDP complexes, multiple G/GDP complexes could interact with the
same signal. Hence, the signal can be enormously amplified. Roughly speaking, this
amplification is on the order of (2β +β 2 )K [G] for suitable β and K as discussed in
Sect. 6.2. Also, one activated G α /GTP can interact with many CAMP. Each CAMP
can create activated proteins via PKA which could all be different due to concentra-
tion differences in the created [CAMP], and the time and spatial activity envelope of
the [CAMP] pulse. Further, PKA can activate any protein that has a certain pattern
of amino acids around serine or threonine residues. Thus, any such protein will be
called by PKA to participate in the extracellular signals effects.
Second, more than one agonist can increase [CAMP] concentration via these
mechanisms. Thus, a general feature of this kind of agonist stimulation is that the
effect is felt away from the site of the initial docking of the extracellular signal. Also,
the full effect of the signal is the creation of a new protein which can completely
rewrite the hardware and software of the excitable nerve cell. We conclude that there
is both amplification and divergence of action. Further, an essential feature of this sort
of activation is that the signaling pathway can be turned off. The G protein system is
self-timing. The G α /GTP is converted back to G/GDP automatically, so if there is
not a stimulus to create new G/GTP, the system reverts to it rest state. Also, CAMP
is always cleaved to AMP by PDE so without the creation of new CAMP, the phos-
phorylation of the target proteins could not continue. Finally, the phosphate ions on
the activated target proteins are always being removed by appropriate phosphatases.
It follows that if a cell has the three mechanism mentioned above operating a a high
state, then extracellular signals will not last long, approximately less than 2 s.
Agonist → Receptor → G
→ Enzyme → Second Messenger
→ Kinase → Effector Protein
The TAR complex is a cluster of proteins associated with the chemotaxic receptor
of the coliform bacteria. It is a large protein which is inserted through the bacteria’s
membrane. It has three portions: the extracellular part where the external signals
6.7 Modulation Pathways 103
bind, the transmembrane part which is inside the membrane and the intracellular
part which is in the cellular cytosol. When the bacteria is moving in an aspartate
concentration gradient, signals are generated to the motor assemblies that drive its
flagella movement. This allows the bacteria to move toward the food source. The
extracellular signal is the aspartate current. The way this works is as follows:
• Aspartate binds to the extracellular domain of the receptor.
• This binding causes a change in the three dimensional appearance of the intracel-
lular portion of TAR. This is called a conformational change.
• The intracellular part of TAR is made up of 4 proteins. The two closest to the
membrane are labeled CheZ which can either by methylated or de methylated by
the enzymes CheR or CheB, respectively. Below them is the protein CheW and
below that is the protein CheA. CheA is an enzyme which phosphorylates both
CheY and CheB. CheY and CheB exist free in the cytosol.
The TAR control action is based on several states. First, if no ligand is bound to TAR
and CheZ is highly methylated, CheA is active and it phosphorylates both CheB and
CheY . The high level of phosphorylated CheYp increases the frequency of the motor’s
switch to clockwise flagella rotation which causes tumbling—essentially, a random
search pattern for food. The phosphorylated CheBp slowly removes methyl from
CheZ which resets the signaling state to that of no ligand and no CheZ methylation.
If TAR does not have a ligand bound and CheZ is not highly methylated, CheA is
inactive and the level of CheYp is reduced. This increases the rate of counterclockwise
flagella rotation as there are fewer switches to clockwise. This results in the bacteria
swimming in a specific direction and hence, this is not a random search pattern.
Finally, if the aspartate signal binds TAR whether or not CheZ is methylated, CheA
becomes inactive which leads to directional swimming as outlined above.
The phosphorylation of CheY occurs at a rate which is proportional to the rate of
external signal activation of TAR. The TAR complex thus operates as a self-contained
solid state processing unit which “integrates” extracellular inputs to produce a certain
rate of phosphorylation of CheA. This influences the rate of change of CheYp and
thus produces alterations in the way the flagella handles rotations. This switches the
bacteria from random search to directed swimming. Control of the CheYp creation
rate effectively controls the ratio of counterclockwise to clockwise rotations. This
sets the direction of the swimming or if the switching is too frequent, sets the motion
to be that of a random walk. The current state of the motor can thus be inferred
from the methylation patterns on the cytosol side of the TAR complex. The general
aspartate pathway thus consists of a single freely diffusing molecule CheYp, a protein
complex which is only concerned with the external stimulus and a second protein
complex which handles the behavioral response.
Next, consider how a cell uses a signal for growth called the platelet derived
growth factor or PDGF. This signal must be noticed by a cell before it can be used.
A single PDGF receptor diffuses freely in the cellular membrane. When it encounters
a PDGF extracellularly, it combines with another receptor to form what is called a
dimer or double PDGF receptor. The paired receptors have tails that extend through
the membrane into the cytosol. Once paired, multiple sites on these tails phosphory-
104 6 Abstracting Principles of Computation
late. This phosphorylation triggers the assembly of six signaling complexes on the
tails. Some fraction of the six signaling complexes are then activated. The activated
complexes then initiate broadcast signals using multiple routes to many destinations
inside the cell. These signals activate and coordinate the many biochemical changes
needed in cell growth. As usual, the response is removed by phosphatases which take
away phosphates from activated complexes. This is, therefore, a cellular strategy to
make a large number of receptors catch as many of the PDGF molecules as possible.
Let’s look at the differences in the handling of the extracellular signal in the TAR
and PDGF cases. PDGF requires a sequence of reactions: receptor clustering into
dimers, phosphorylation of critical sites and the binding of multiple signaling mole-
cule complexes. Hence, the response here is slower than the conformational change
that occurs within a preformed TAR complex due to the aspartate signal. Notice
though that the TAR response needs to be quick as it alters swimming patterns for
finding food. There is not a need for such speed in the PDGF response. Also, not
every PDGF receptor carries a signaling complex as only those that encounter a
PDGF molecule undergo the changes that result in the addition of signaling com-
plexes. Thus, when the cell makes a large number of receptors and casts a wide net
to catch PGDF signals, there is not a large cost. This is because there is no need
to make expensive signaling complexes on all receptors—only the ones which need
them.
TAR complex however must react quickly. In fact, there is even a real possibility
that all TAR receptors could be occupied at the same time. Since the bacteria must
respond to an aspartate concentration gradient on the order of 105 , the response to
90 and 95 % TAR receptor occupancy must be different. Hence, all receptors must
carry the signaling complexes necessary to alter the flagella motor.
The PGDF receptor complex produces multiple outputs which implies the initial
extracellular signal is divergent with effects spreading throughout the cell. The TAR
complex has only two outputs: CheYp and CheBp which both effect motor response.
However, there are also other pathways that effect CheYp and CheBp. Hence, there
are multiple controls here that converge on a single output pair.
In general, the concentration of proteins in the cytosol is high enough that there
is competition among macromolecules for water. This effects the diffusion rate of
the macromolecules and changes the tendency of proteins to associate with other
proteins. Hence, the formation of a signaling complex requires a sequence of binding
steps in which proteins come together in a diffusion limited interaction governed by
mass action. This implies that association among macromolecules is strongly favored
in these crowded conditions. Now, the TAR complex serves a focused purpose and
drives changes in swimming action directly and quickly. Signaling complexes as
seen in the PDGF receptor are favored for different reasons. Clusters of molecules
in permanent association with each other are well suited to perform reactions that
integrate, codify and transduce extracellular signals. These solid state computations
are more rapid, efficient and noise free than a system of diffusing molecules in which
encounters are subject to random fluctuations of thermal energy. Also, complexes
are made only once. So even though the construction process uses diffusion, once
completed, the complex can repeatedly and accurately perform the same task.
6.7 Modulation Pathways 105
A macromolecule can switch its conformational state 104 –106 times a second at
the cost of 1 ATP per switch. So we might infer from this that biological information
has a cost of 1 ATP per bit. However, in a real system the major expense is not
in registering conformational change. The true cost is the amount of energy used to
distribute, integrate and process the information of the extracellular signal within and
between systems. Hence, if a switch in conformational state gives a rough indication
of information storage rate, we might think the biological information storage rate is
about 104 –106 bits per second. However, based on the distribution, integration and
processing costs, it is more like 103 –105 per second at a cost of 106 ATP per bit.
Hence, a guiding principle is that the second messenger triggers will store infor-
mation at a rate of about 103 per second at a cost of 106 energy units per bit.
Although signaling complexes are indeed powerful tools, there is also a strong
need for diffusion based mechanisms. Most extracellular stimuli are not passed in
a linear chain of cause and effect from one receptor in the membrane to one target
molecule in the nucleus or cytosol. If such a linear chain could be used, the signal
could be passed very efficiently as a chain of conformational changes along a protein
filament. In fact, signals such as we see in the axon of an excitable neuron are essen-
tially passed along in this manner. However, most extracellular stimuli are spread as
a rapidly diverging influence from the receptor to multiple chemical targets in the
cell at often many different spatial locations. This divergence allows the possibility
of signal amplification and cross-talk between different signal processes chains. The
machinery of the signal spread in this way always requires at least one molecule
that diffuses as a free element. Note that always associating one receptor with one
effector (as in the CheYp to flagella motor effector) is faster and more efficient than
the general diffusion mechanism. However, it is a much smaller and more localized
effect!
The TAR receptor approach does use a freely diffusing molecule, CHeYp, but the
action of this pathway is restricted to the flagella motor. Hence there is no divergence
of action. In the PDGF receptor approach, there is a large cost associated with
the construction of the signaling complex, but once built, diffusion is used as the
mechanism to send the PDGF signal in a divergent manner across the cell. Hence,
one extracellular event triggers one change or one extracellular event triggers 103 –
105 changes! This is a large increase in the sensitivity of the cell to the extracellular
input. We see a cell pays a price in time and consumption of ATP molecules for a
huge reward in sensitivity.
References
Now that we have discussed some of the extracellular trigger mechanisms used in
bioinformation processing, it is time to look at the fundamental process of diffusion
within a cell. Our second messenger systems often involve Ca ++ ion movement
in and out of the cell. The amount of free Ca ++ ion in the cell is controlled by
complicated mechanisms, but some is stored in buffer complexes. The release of
calcium ion from these buffers plays a big role in cellular regulatory processes and
which protein P(T1 ) is actually created from a trigger T0 . Following discussions in
Berridge (1997), Wagner and Keizer (1994) and Höffer et al. (2001), we will now
describe a general model of calcium ion movement and buffering. These ideas will
give us additional insight into how to model second messenger triggering events.
Calcium is bound to many proteins and other molecules in the cytosol. This binding is
fast compared to calcium release and diffusion. From Wagner and Keizer (1994), we
can develop a rapid-equilibrium approximation to calcium buffering. Let’s assume
there are M different calcium binding sites; in effect, there are different calcium
species. We label these sites with the index j. Each has a binding rate constant
k +j and a disassociation rate constant k −j . We assume each of these binding sites
is homogeneously distributed throughout the cytosol with concentration B j . We let
the concentration of free binding sites for species j be denoted by b j (t, x) and
the concentration of occupied binding sites is c j (t, x); where t is our time variable
and x is the spatial variable. The units of b j are (mM B j )/liter and c j has units
(mM B j )/liter. We let u(t, x) be the concentration of free calcium ion in the cytosol
at (t, x). Hence, we have the reactions
Ca + B j →k +j Ca/B j
Ca/B j →k −j Ca + B j
d[Ca]
= −k +j [Ca][B j ] + k −j [Ca/B j ]
dt
d[Ca/B j ]
= k +j [Ca][B j ] − k −j [Ca/B j ]
dt
From this, we have the unit analysis
mM Ca mM Ca mM B j mM Ca/B j
= units of k +j × + units of k −j ×
liter − sec liter liter liter
This tells us the units of k +j are liter/(mM B j -sec) and k −j has units (mM of Ca)/
( (mM Ca/B j )-sec). Hence, k −j c j has units of (mM Ca)/(liter-sec), k +j b j , 1/s and
k +j b j u(t, x), (mM Ca)/(liter-sec). Let the diffusion constant for each site be D j . We
assume D j is independent of the bound calcium molecules; that is, it is independent
of c j . We also assume a homogeneous distribution of the total concentration B j .
This implies a local conservation law
b j (t, x) + c j (t, x) = B j
For simplicity, we will use a 1D model here; thus the spatial variable is just the
coordinate x rather than the vector (x, y, z). For the one dimensional domain of
length L, we have 0 ≤ x ≤ L. In a cell, some of the calcium ion is bound in storage
pools in the endoplasmic reticulum or ER. The amount of release/uptake could be a
nonlinear function of the concentration of calcium ion. Hence, we model this effect
with the nonlinear mapping f 0 (u). The diffusion dynamics are then
∂u
= rate of ER release and uptake
∂t
+ sum over all calcium species(amount freed from binding site − amount bound)
∂2u
+ D0
∂x 2
Thus, we find
M
∂u − + ∂2u
= f 0 (u) + k j c j − k j (B j − c j )u + D0 (7.1)
∂t j=1
∂x 2
7.1 Calcium Diffusion in the Cytosol 109
∂c j ∂2c j
= −k −j c j + k +j (B j − c j )u + D j (7.2)
∂t ∂x 2
where D0 is diffusion coefficient for free calcium ion. Our domain is a one dimen-
sional line segment, so there should be diffusion across the boundaries at x = 0 and
x = L for calcium but not for binding molecules. Let J0 and JL denote the flux of
calcium ion across the boundary at 0 and L. We will usually just say J0,L for short.
Our boundary conditions then become
∂u
− D0 = J0 (7.3)
∂x 0
∂u
−D0 = JL (7.4)
∂x L
∂c j
Dj =0 (7.5)
∂x 0
∂c j
Dj =0 (7.6)
∂x L
The concentration of total calcium is the sum of the free and the bound. We denote
this by w(t, x) and note that
M
w=u+ cj
j=1
∂2w
M M
∂w ∂2c j
= f0 w − c j + D0 2 + (D j − D0 ) 2
∂t j=1
∂x j=1
∂x
Note that
∂u ∂c j
M
∂w
= +
∂t ∂t j=1
∂t
M
− + ∂2u
= f 0 (u) + k j c j − k j (B j − c j )u + D0 2
j=1
∂x
M M
− + ∂2c j
+ −k j c j + k j (B j − c j )u + Dj
j=1 j=1
∂x 2
∂2u
M
∂2c j
= f 0 (u) + D0 + D j
∂x 2 j=1
∂x 2
110 7 Second Messenger Diffusion Pathways
∂2c j
M M M M
∂2u ∂2c j ∂2c j
= f0 w − c j + D0 2 + D0 − D 0 + Dj
j=1
∂x j=1
∂x 2
j=1
∂x 2
j=1
∂x 2
2
∂ u ∂2c j
M M M
∂2c j
= f0 w − c j + D0 + − (D 0 − D j )
j=1
∂x 2 j=1
∂x 2 j=1
∂x 2
∂2w
M M
∂2c j
= f0 w − c j + D0 2 + (D j − D0 ) 2
j=1
∂x j=1
∂x
The boundary conditions are 0 and L can then be computed using Eqs. 7.3–7.6.
∂w ∂u ∂c j ∂c j
M M
−D0 = −D0 − Dj − (D0 − D j ) = J0,L
∂x 0,L ∂x 0,L j=1
∂x 0,L j=1
∂x 0,L
∂2w
M M
∂w ∂2c j
= f0 w − c j + D0 2 + (D j − D0 ) 2 (7.7)
∂t j=1
∂x j=1
∂x
∂w ∂c j
M
J0,L = −D0 − (D − D ) (7.8)
∂x 0,L ∂x 0,L
0 j
j=1
k −j c j − k +j (B j − c j )u = 0
Solving, we find
Bju Bju
cj = = (7.9)
k −j Kj + u
k +j
+u
7.1 Calcium Diffusion in the Cytosol 111
k−
where K j is the ratio k +j . Now, we also know u = w − M j=1 c j . Substituting this in
j
for u, we have
⎡ ⎛ ⎞⎤
M M
c j ⎣ K j + ⎝w − c j ⎠⎦ = B j w − cj (7.10)
j=1 j=1
This simplifies to
B j w = −c2j + (K j + w + B j )c j + (B j − 1) ck
k= j
This shows that the concentration of occupied binding sites for species j is a function
of the total calcium concentration w. Let c j (w) denote this functional dependence.
∂c ∂c
From the chain rule, we have ∂xj = ∂wj ∂w∂x
. Then, the calcium dynamics become
M
∂
M
∂w ∂2w ∂c j ∂w
= f0 w − c j (w) + D0 2 + (D j − D0 )
∂t j=1
∂x ∂x j=1 ∂w ∂x
M M
∂ ∂c j ∂w
= f0 w − c j (w) + D0 + (D j − D0 )
j=1
∂x j=1
∂w ∂x
To summarize, the dynamics for calcium, with the assumption that calcium binding
into the buffers is fast compared to release and uptake from the calcium pools in the
ER, are
M M
∂w ∂ ∂c j ∂w
= f0 w − c j (w) + D0 + (D j − D0 ) (7.11)
∂t j=1
∂x j=1
∂w ∂x
M
∂c j ∂w
J0,L = −D0 − (D0 − D j ) (7.12)
j=1
∂w 0,L ∂x 0,L
Notice that if we define a new diffusion coefficient, D for the diffusion process that
governs w by
M
∂c j
D = D0 + (D j − D0 ) (7.13)
j=1
∂w
112 7 Second Messenger Diffusion Pathways
Bj
cj = u (7.16)
Kj
Bj
Thus, w = 1 + Mj=1 Kj
u and solving for c j , we find
Bj Bj
Kj
w ∂c j Kj
cj = M =⇒ = M
1+ Bk
k=1 K k
∂w 1 + k=1 Bk
Kk
Bj
Then, letting γ j = Kj
,
M M
∂w γjw ∂ γj ∂w
= f0 w − M + D0 + (D j − D0 ) M
∂t j=1 1 + k=1 γk
∂x j=1 1 + k=1 γk ∂x
M
γjw γj ∂2w
= f0 w − M + D0 + (D j − D0 ) M
1+ k=1 γk j=1 1+ k=1 γk ∂x 2
7.1 Calcium Diffusion in the Cytosol 113
M ∂w
Letting denote the term 1 + k=1 γk , then w = u and so we have ∂t
= ∂u
∂t
.
Thus,
M M
∂u γ j u γj ∂2u
= f 0 u − + D0 + (D j − D0 ) 2
∂t j=1
j=1
∂x
M 2
∂ u
= f0 u − ( − 1)u + D0 + (D j − D0 )γ j
j=1
∂x 2
f 0 (u)
f (u) = . (7.17)
We note that, in this approximation of binding is much less than disassociation, the
diffusion constant D has a new form
M M
M
γj D0 + j=1 D j γ j − ( − 1)D0 D0 + j=1 D j γ j
D → D0 + (D j − D0 ) = =
j=1
∂u ∂2u
= f (u) + D̂ 2 (7.19)
∂t ∂x
We call the function f , the effective calcium release/uptake rate and D̂, the effective
diffusion coefficient. What about the new boundary conditions? This is just another
somewhat unpleasant calculation:
∂w
M
∂c j
− D0 + (D j − D0 ) |0,L = J0,L
j=1
∂w ∂x 0,L
114 7 Second Messenger Diffusion Pathways
M
(D j − D0 )γ j ∂u
− D0 + = J0,L
j=1
∂x 0,L
∂u
M
− D0 + (D j − D0 )γ j = J0,L
j=1
∂x 0,L
∂u
M
− D0 + Djγj = J0,L
j=1
∂x 0,L
∂u
−D̂ = J0,L
∂x 0,L
∂u f 0 (u) ∂2u
= + D̂ 2 (7.20)
∂t ∂x
∂u J0,L
−D̂ |0,L = (7.21)
∂x
The critical review on the control of free calcium in cellular processing in Carafoli
et al. (2001) notes the concentration of Ca ++ in the cell is controlled by the reversible
binding of calcium ion to the buffer complexes we have been discussing. These buffer
molecules therefore act as calcium ion sensors that, in a sense, decode the information
contained in the calcium ion current injection and then pass on a decision to a target.
Many of these targets can be proteins transcribed by accessing the genome. Hence,
the P(T1 ) we have discussed in Chap. 6 could be a buffer molecule B j . The boundary
condition J0,L plays the role of our entry calcium current. Such a calcium ion input
current through the membrane could be due to membrane depolarization causing an
influx of calcium ions through the port or via ligand binding to a receptor which in
turn indirectly increases free calcium ion in the cytosol. Such mechanisms involve
the interplay between the release/uptake ER function and the storage buffers as the
previous sections have shown. This boundary current determines the u(t, x) solution
through the diffusion equations Eqs. 7.20 and 7.21. The exact nature of this solution
is determined by the receptor types, buffers and storage sites in the ER. Differences
in the period and magnitude of the calcium current u(t, x) resulting from the input
J0,L trigger different second messenger pathways. Hence, there are many possible
outcomes due to a given input current J0,L .
Let’s do a back of the envelope calculation to see what might happen if a trigger
event T0 initiated an increase in a buffer B j which then initiates a complex trigger
mechanism culminating in a protein transcription. Let’s assume that the buffer B j0 is
increased to B j0 + . It is reasonable to assume that both ki+ and ki− are independent
7.2 Transcriptional Control of Free Calcium 115
of the amount of Bi that is present. Hence, we can assume all the γi ’s are unaffected
by the change in B j0 . If binding activity goes up, we would also expect that the
release/uptake activity would also change. There are then changes in the terms that
define the diffusion constant
M
Bk Bj +
→ 1+ + 0
k= j
Kk K j0
0
f (u) → f (u) + η
B +
D0 + D j0 Kj0 j + kM= j0 Dk γk
D̂ →
0
B +
1 + kM= j0 KBkk + Kj0 j
M
for some nonzero η and positive . Letting = 1 + k=1 Dk γk , ξ j = 1
Kj
, we find
M D j0 M D j0
D0 + k=1 Dk γk + K j D0 + k=1 Dk γk + K j (D0 − 1) + + D j0 ξ j0
D̂ new = M
0
= 0
=
1+ 1 + K1 + ξ j0
k=1 γ j + K j j0
0
D0 +−1
Thus, noting D̂ =
, we find
D̂ = D̂ new − D̂
(D0 − 1) + + D j0 ξ j0 (D0 − 1) +
= −
+ ξ j0
[(D0 − 1) + + D j0 ξ j0 ] − [(D0 − 1) + ][ + ξ j0 ]
=
[ + ξ j0 ][]
D j0 − (D0 − 1 + ) D j0 − D 1 D
= ξ j0 = ξ j0 = ξ j0 D j0 1−
[ + ξ j0 ] [ + ξ j0 ] + ξ j0 D j0
We see that
D
D̂ = ξ j0 D j0 1 − ∝
+ ξ j0 D j0 + ξ j0
for some constant C. The alteration of the release/uptake function and the diffusion
constant imply a change in the solution. This change in the solution u(t, x) then
can initiate further second messenger changes culminating in altered P(T1 ) protein
production.
References
M. Berridge, Elementary and global aspects of calcium signalling. J. Physiol. 499, 291–306 (1997)
E. Carafoli, L. Santella, D. Branca, M. Brini, Generation, control and processing of cellular calcium
signals. Crit. Rev. Biochem. Mol. Biol. 36(2), 107–260 (2001)
T. Höffer, A. Politi, R. Heinrich, Intracellular ca+2 wave propagation through gap - junctional ca+2
diffusion: A theoretical study. Biophys. J. 80(1), 75–87 (2001)
J. Wagner, J. Keizer, Effects of rapid buffers on ca+2 diffusion and ca+2 oscillations. Biophys. J.
67, 447–456 (1994)
Chapter 8
Second Messenger Models
In Sect. 6.7, we discussed some of the basic features of the pathways used by
extracellular triggers. We now look at these again but very abstractly as we want
to design principles by which we can model second messenger effects. Let T0 denote
a second messenger trigger which moves though a port P to create a new trigger T1
some of which binds to B1 . A schematic of this is shown in Fig. 8.1. In the figure, r
is a number between 0 and 1 which represents the fraction of the trigger T1 which is
free in the cytosol. Hence, 100r % of T1 is free and 100(1 − r) is bound to B1 cre-
ating a storage complex B1 /T1 . For our simple model, we assume rT1 is transported
to the nuclear membrane where some of it binds to the enzyme E1 . Let s in (0, 1)
denote the fraction of rT1 that binds to E1 . We illustrate this in Fig. 8.2. We denote
the complex formed by the binding of E1 and T1 by E1 /T1 . From Fig. 8.2, we see that
the proportion of T1 that binds to the genome (DNA) and initiates protein creation
P(T1 ) is thus srT1 .
The protein created, P(T1 ), could be many things. Here, let us assume that P(T1 ) is
a sodium, Na+ , gate. Thus, our high level model is
where t is time and V is membrane voltage. The variables MNa and HNa are the
activation and inactivation functions for the sodium gate with p and q appropriate
© Springer Science+Business Media Singapore 2016 117
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_8
118 8 Second Messenger Models
where the two shaping parameters gp (transition rate) and [T0 ]b (threshold) must be
chosen. We can thus model the schematic of Fig. 8.1 as hp ([T0 ]) [T1 ]n where [T1 ]n is
the nominal concentration of the induced trigger T1 . In a similar way, we let
s x − x0
he (x) = 1 + tanh
2 ge
which lies in [0, s). This is the amount of activated T1 which reaches the genome to
create the target protein P(T1 ). It follows then that
[P(T1 )] = he hp ([T0 ]) [T1 ]n [T1 ]n
The protein is created with efficiency e and so we model the conversion of [P(T1 )]
into a change in gNa
max
as follows. Let
e x − x0
hNa (x) = 1 + tanh
2 gNa
which has output in [0, e). Here, we want to limit how large a change we can achieve in
gNa
max
. Hence, we assume there is an upper limit which is given by gNa max
= δNa gNamax
.
Thus, we limit the change in the maximum sodium conductance to some percentage
of its baseline value. It follows that hNa (x) is about δNa if x is sufficiently larges and
small otherwise. This suggests that x should be [P(T1 )] and since translation to P(T1 )
occurs no matter how low [T1 ] is, we can use a switch point value of x0 = 0. We
conclude
e [P(T1 )]
hNa ([P(T1 ]) = δNa gNa 1 + tanh
max
(8.1)
2 gNa
Our model of the change in maximum sodium conductance is therefore gNa max
=
hNa ([P(T1 )]). We can thus alter the action potential via a second messenger trigger
by allowing
p q
gNa (t, V ) = ( gNa
max
+ hNa ([P(T1 )]))MNa (t, V )HNa (t, V )
We can follow the procedure outlined in this section for a variety of triggers. We
therefore can add a potassium gate trigger with shaping parameters
K
r , [T0 ]K b , gpK , sK , geK , eK , gK , δK
8.1 Generic Second Messenger Triggers 121
where
e [P(T1 )]
hNa ([P(T1 )]) = δNa gNa 1 + tanh
max
2 gNa
with e and δNa in (0, 1). Using the usual transition function σ(x, x0 , g0 ), we can then
write the sodium conductance modification equation more compactly as
Finally,
p q
gNa (t, V ) = gNa
max
( 1 + eδNa σ([P(T1 )], 0, gNa ) )MNa (t, V )HNa (t, V )
Implicit is this formula is the cascade “σ(σ(σ” as σ([P(T1 )], 0, gNa ) uses two con-
catenated sigmoid calculations itself. We label this as a Sigma Three Transition, σ3 ,
and use the notation
122 8 Second Messenger Models
where gT denotes the final gain associated with the third level sigmoidal transition
to create the final gate product. We can then rewrite our modulation equation as
p q
gNa (t, V ) = gNa
max
(1 + eδNa h3 (WNa )) MNa (t, V )HNa (t, V )
We can draw this as a graph as is shown in Fig. 8.4 where h denotes the standard
sigmoidal state transition function. We also have
r[T1 ]n
s hp ([T0 ])[T1 ]n −
he (hp ([T0 ]) [T1 ]n ) = 1 + tanh 2
2 ge
8.2 A Graphic Model Computation Model 123
and the computation can be represented graphically by Fig. 8.5a. Finally, we have
used
[P(T1 )] = he (hp ([T0 ])[T1 ]n )[T1 ]n
which can be shown diagrammatically as in Fig. 8.5b. These graphs are, of course,
becoming increasingly complicated. However, they are quite useful in depicting how
feedback pathways can be added to our computations. Let’s add feedback pathways
124 8 Second Messenger Models
Fig. 8.6 Adding feedback to the maximum sodium conductance control pathway
We now specialize to the case of the Ca++ triggers. Then [T0 ] represents [Ca++ ]
and [T1 ] will denote some sort of Ca++ binding complex. We will use the generic
name calmodulin for this complex and use the symbol C ∗ as its symbol. Hence,
[T1 ] = [C ∗ ] with [C ∗ ]n the nominal calcineuron concentration. Thus,
r ++ [C ∗ ]n Ca++ Ca++
WCa++ = [Ca++ ], [Ca++ ]b , gCa++ , rCa++ , [C ∗ ]n , Ca , ge ,s , [C ∗ ]n , 0, gCa++
2
Using the standard Ball and Stick model, the calcium injection current can be inserted
at any electrotonic distance x0 with 0 ≤ x0 ≤ 4. For convenience, we usually will
only allow x0 to be integer valued. The previous trigger equations without spatial
dependence are given below:
++ r [Ca++ (t)] − [Ca++ ]b
hp ([Ca (t)]) = 1 + tanh
2 gp
∗
++ ∗ s hp ([Ca++ (t)])[C ∗ ]n − r[C2 ]n
he (hp ([Ca (t)]) [C ]n ) = 1 + tanh
2 ge
[P([C ∗ ](t)] = he hp ([Ca++ (t)]) [C ∗ ]n [C ∗ ]n
e [P(C ∗ (t))]
hNa ([P([C ∗ ](t))]) = δNa gNa max
1 + tanh
2 gNa
where for convenience of discussion, we have refrained from labeling all of the rel-
evant variables with Ca++ tags. Now given the injection current [Ca++ (t)], only a
portion of it is used to form the C ∗ complex. The initial fraction is hp ([Ca++ (t)])[C ∗ ]n
and it begins the diffusion across the cellular cytosol. Hence, following the discus-
sions in Chap. 7, if the injection is at site x0 , the solution to the diffusion equation
for this initial pulse is given by
hp ([Ca++ (t)])[C ∗ ]n −(x − x0 )2
I(t, x) = √ exp
t 4Dt
Note that I(t, x) explicitly contains time and spatial attenuation. Combining, we see
the full spatial dependent equations are:
8.4 Spatially Dependent Calcium Triggers 127
r [Ca++ (t)] − [Ca++ ]b
hp ([Ca++ (t)]) = 1 + tanh
2 gp
++ ∗
hp ([Ca (t)])[C ]n −(x − x0 )2
I(t, x)) = √ exp
t 4Dt
r[C ∗ ]n
s I(t, x) − 2
he (I(t, x)) = 1 + tanh
2 ge
[P([C ∗ ](t)) = he (hp ([Ca++ ](t)) [C ∗ ]n )[C ∗ ]n
∗ e [P(C ∗ (t))]
hNa ([P(C (t))]) = δNa gNa 1 + tanh
max
2 gNa
+ hNa ([P(C ∗ (t)0] )MNa (t, V )HNa (t, V )
p q
gNa (t, V ) = (gNa
max
Now consider the actual feedback paths shown in Fig. 8.6 for a Ca++ trigger. In a
calcium second messenger event, the calcium current enters the cell through the port
and some remains free and some is bound into the calcineuron complex. In this
context, we interpret the feedback as follows: 1 − ξ0 of the bound [Ca∗ ](t) is fed
back to the port as free Ca++ . This happens because the bound complex disassociates
back into Ca++ plus other components. This amount is fed back into the input of hp
and the feedback response J(t) is given by
∞
J(t) = hp ((1 − ξ0 )j [Ca++ (t)])
j=0
∞
r (1 − ξ0 )j ([Ca++ (t)] − [Ca++ ]b )
= 1 + tanh
j=0
2 gp
To estimate this value, note for u = 1 − ξ0 , u is in (0, 1). It follows from the mean
value theorem that for any positive integer j, we have for some point c between x
and x0
j j
r u (x − x0 ) r uj u (cx − x0 )
1 + tanh = sech2 (x − x0 )
2 g 2g g
uj sech2 (cx −x
g
0)
≈
sech2 x g 0 )
(d −x
(c −x )
uj sech2 x g 0
Thus, to first order, letting γx =
(d −x )
, we have
sech2 x g 0
j
r u (x − x0 ) j r x − x0
1 + tanh ≈ γx u 1 + tanh
2 g 2 g
Applying this to our sum J we find It then follows, using γ(t) for the term γCa++ (t),
∞
J(t) ≈ γx hp ([Ca++ (t)]) (1 − ξ0 )j
j=0
1
= γ(t) hp ([Ca++ (t)])
ξ0
Thus, the multiplier applied to the gNa computation due to feedback is of order ξ10 ;
i.e. J ≤ γx ξ10 The constant γx is difficult to assess even though we have first order
bounds for it. It is approximately given by
r (1 − ξ0 )([Ca++ (0)] − [Ca++ ]b )
γ(t) ≈ sech2 ([Ca++ (t)] − [Ca++ ]b ).
2g g
From the discussions so far, we can infer that the calcium second messenger pathways
can influence the shape of the action potential in many ways. For our purposes, we
will concentrate on just a fraction of these possibilities. If a calcium current is injected
into the dendrite at time and spatial position (t0 , x0 ), it creates a protein response from
the nucleus following the equations below for various ions. We use the label a to
denote the fact that these equations are for protein P(C ∗ ) = a. The calcium injection
current is also labeled with the protein a as it is known that calcium currents are
specifically targeted towards various protein outputs.
8.5 Calcium Second Messenger Pathways 129
ra [Caa,++ (t)] − [Caa,++ ]b
hpa ([Caa,++ (t)]) = 1 + tanh
2 gpa
hpa ([Caa,++ (t)])[C a,∗ ]n −(x − x0 )2
I a (t, x)) = √ exp
t − t0 4Da (t − t0 )
I a (t, x) − r [C2 ]n
a a,∗
sa
he (I(t, x)) =
a
1 + tanh
2 gea
[Pa ([C a,∗ ](t))] = hea (hpa ([Ca++ (t)]) [C a,∗ ]n )[C a,∗ ]n
a a,∗
ea [P (C (t))]
h ([P ([C ](t)]) =
a a a,∗
δa ga max
1 + tanh
2 ga
ga (t, V ) = (ga + ha ([P ( [C ](t)))Mpa (t, V )Haq (t, V )
max a a,∗
Thus, we have
implying that the calcium second messenger activity alters the maximum ion con-
ductances as follows:
−(t − t0 )
δgNa (t0 , x0 ) = hNa PNa ( C Na,∗ n exp
τcNa
−(t − t0 )
δgK (t0 , x0 ) = hK PK ( C K,∗ n exp
τcK
We see that [Ca++ ] currents that enter the dendrite initiate cellular changes that effect
potassium conductance and hence the hyperpolarization portion of the action poten-
tial curve. Also, these currents can effect the sodium conductance which modifies the
depolarization portion of the action potential. Further, these changes in the maximum
sodium and potassium conductances also effect the generation of the action potential
itself. If feedback is introduced, we know there is a multiplier effect for each ion a:
1
J a (t) = γ a (t) hpa ([Ca++ (t)])
ξ0a
which alters the influence equations by adding a multiplier at the first step:
1 ra [Caa,++ (t)] − [Caa,++ ]b
hpa ([Caa,++ (t)]) = γ a (t) a 1 + tanh
ξ0 2 gpa
hpa ([Caa,++ (t)])[C a,∗ ]n −(x − x0 )2
I (t, x)) =
a
√ exp
t − t0 4Da (t − t0 )
130 8 Second Messenger Models
r a [C a,∗ ]n
sa I a (t, x) −
hea (I(t, x)) = 1 + tanh 2
2 gea
[Pa ([C a,∗ ](t)] = hea (hpa ([Ca++ (t)]) [C a,∗ ]n )[C a,∗ ]n
a a,∗
ea [P (C (t))]
h ([P ([C ](t)]) =
a a a,∗
δa ga max
1 + tanh
2 ga
ga (t, V ) = (ga + ha ([P ([C ](t))]) )Mpa (t, V )Haq (t, V )
max a a,∗
The full effect of the multiplier is nonlinear as it is fed forward through a series
of sigmoid transformations. There are only five integer values for the electrotonic
distance which can be used for the injection sites. These are x0 = 0 (right at the soma)
to x0 = 4 (farthest from the soma). At each of these sites, calcium injection currents
can be applied at various times. Hence given a specific time t0 , the total effect on the
sodium and potassium conductances is given by
4
4
δgNa (t0 ) = δgNa (t0 , i), δgK (t0 , x0 ) = δgK (t0 , i)
i=0 i=0
−(x−i)2
where the attenuation due to the terms exp 4Da (t−t0 )
for the ions a is built into the
calculations.
When two cells interact via a synaptic interface, the electrical signal in the pre-
synaptic cell in some circumstances triggers a release of a neurotransmitter (NT)
from the pre-synapse which crosses the synaptic cleft and then by docking to a
port on the post cell, initiates a post-synaptic cellular response. The general pre-
synaptic mechanism consists of several key elements: one, NT synthesis machinery
so the NT can be made locally; two, receptors for NT uptake and regulation; three,
enzymes that package the NT into vesicles in the pre-synapse membrane for delivery
to the cleft. There are two general pre-synaptic types: monoamine and peptide. In the
monoamine case, all three elements for the pre-cell response are first manufactured in
the pre-cell using instructions contained in the pre-cell’s genome and shipped to the
pre-synapse. Hence, the monoamine pre-synapse does not require further instructions
from the pre-cell genome and response is therefore fast. The peptide pre-synapse
can only manufacture a peptide neurotransmitter in the pre-cell genome; if a peptide
neurotransmitter is needed, there is a lag in response time. Also, in the peptide case,
there is no re-uptake pump so peptide NT can’t be reused.
On the post-synaptic side, the fast response is triggered when the bound NT/
Receptor complex initiates an immediate change in ion flux through the gate thereby
altering the electrical response of the post cell membrane and hence, ultimately its
8.6 General Pharmacological Inputs 131
action potential and spike train pattern. Examples are glutumate (excitatory) and
GABA (inhibitory) neurotransmitters. The slow response occurs when the initiating
NT triggers a second messenger response in the interior of the cell. There are two
general families of receptors we are interested in: family 1:7 transmembrane regions
and family 2:4 transmembrane regions. The responses mediated by these families
are critical to the design of a proper abstract cell capable of interesting biological
activity.
The 7 transmembrane regions are arranged in a circle in the membrane with a central
core. The first messenger, FM, docks with the receptor, R7 , creating a complex,
NT /R7 . The complex undergoes a conformational change allowing it to bind to
a class of proteins (G-proteins) creating a new complex, NT /R7 /G. This causes
a conformational change in the G protein allowing it to bind to an intracellular
enzyme E inducing another conformational change on E. At this point, we have a
final complex, NT /R7 /G/E, in which the E has been activated so that it can release
the substance we call the second messenger, SM. The substance SM can then initiate
many intracellular processes by triggering a cascade of reactions that culminate in
the transcription of a protein from the genome in the post-cell nucleus. A typical
pathway has SM activating an intracellular enzyme E2 which creates a transcription
factor, TF. The transcription factor TF crosses into the nucleus to transcribe a protein
P. The protein P may then bind to other genes in the post genome to initiate an entire
cascade of gene activity, the protein P may transport out of the post nucleus as the
voltage activated gate protein which is inserted into the post cell membrane which
effectively increases the maximum ion conductance for some ion or many other
possibilities. These possibilities can be modeled with our abstract feature vector
approach discussed in Chap. 9 where we associate a ten dimensional vector with the
axonal output of the post neuron.
• The protein P transports out of the post nucleus as the voltage activated gate
protein which is inserted into the post cell membrane which effectively increases
the maximum ion conductance for some ion. We model this effect by alterations
in the ten feature vector parameters.
• The protein P operates in the synaptic cleft increasing or decreasing neurotransmit-
ter uptake and neurotransmitter creation. These effects are somewhat intertwined
with the first possibility sketched out above, but these have an intrinsic time delay
before they take effect.
The docking of the FM therefore initiates a complete repertoire of response that can
completely reshape the biological structure of the post-cell and even surrounding
cells. There are several time periods involved here: the protein P may be built and
132 8 Second Messenger Models
transported to its new site for use in minutes (6 × 104 ms), hours (3.6 × 105 ms),
days (8.6 × 106 ms) or weeks (6 × 107 ms). Note the rough times 10 magnitude
increases here.
In this family, 4 transmembrane regions are arranged in a circle and five copies of
this circle are organized to form a pore in the membrane. The central pore of these
channels can be opened to increase or closed to decrease substance flux by two
means: one, the gate is voltage activated (the standard Hodgkin–Huxley ion gate)
and two, the gate is ligand activated which means substances bind to specialized
regions of the gate to cause alterations in the flux of substances through the channel.
The substance going through the channel could be a sodium ion or a first messenger.
The ligand gated channels are very interesting as they allow very complicated control
possibilities to emerge. Each 4 transmembrane region circle can have its own specific
regulatory domains and hence can be denoted as a type α circle, C4α . The full 5 circle
ion channel is thus a concatenation of 5 such circles and can be labeled C4α(1) ·
C4α(2) · C4α(3) · C4α(4) · C4α(5) or more simply as C4 (α1 α2 α3 α4 α5 ). This is much more
conveniently written as C4 (α) where the arrow over the α indicates that it has multiple
independent components. There is much complexity here since a given α(i) module
can have multiple regulatory docking sites for multiple substances and hence, the
full R4 type receptor can have a very sophisticated structure. The amount by which a
channel is open or closed is thus capable of being controlled in an exquisitely precise
fashion.
There is a spectrum of response which is called the agonist spectrum in which the
channel response can be altered from full open to full closed. Call the original FM
for this gate NT. Then if NT is an excitatory substance, the docking of NT to a circle
opens the channel. In this case, we call NT an agonist and denote it by AG. On
the other hand, NT might be an inhibitor and close the channel; in that case, NT
is called an antagonist or AT. We can even have agonists and antagonists that open
or close a channel partially by r %, where r is something less that 100 %. These
work by replacing a docked agonist or antagonist and hence reducing the channel
action in either case: full open goes to partially open, full closed goes to partially
open. Hence, there is a spectrum of agonist activity ranging from full open to full
closed. Since each circle can be of a different type, the resulting behavior can be quite
complicated. Further, the C4 gates can control first messengers that initiate second
8.6 General Pharmacological Inputs 133
messenger cascades as well as immediate ion flux alterations. Thus, the production
of the second messenger and its possible protein target can also be controlled with
great precision.
There is also allosteric modification where a first messenger FM1 binding to receptor
C4 (α) can have its activity modulated by another first messenger FM2 binding to a
second receptor C4 (β) to either amplify or block the FM1 effect. In this case, FM2 has
no action of its own; only if FM1 is present and docked is there a modulation. This type
of modification can also occur on the same receptor if C4 (α) has a second binding site
on one of its C4 regions for FM2 . Finally, there are cotransmitter pairings where two
first messengers can both operate independently but also together to either enhance or
diminish response. The ligand gated channels can often use drugs or pharmacological
inputs to mimic NT/receptor binding and hence generate modulated response. To
model cognition, it is thus clear that we eventually need to model pharmacological
effects of the types discussed above.
discussed. It binds in some fashion to a port or gate specialized for the ζ neurotrans-
mitter. The neurotransmitter ζ then initiates a cascade of reactions:
• It passes through the gate, entering the interior of the cable. It then forms
a complex, ζ̂.
• Inside the post-dendrite, ζ̂ influences the passage of ions through the cable wall.
For example, it may increase the passage of Na+ through the membrane of the
cable thereby initiating an ESP. It could also influence the formation of a calcium
current, an increase in K+ and so forth.
• The influence via ζ̂ can be that of a second messenger trigger.
Hence, each neuron creates a brew of neurotransmitters specific to its type. A trigger
of type T0 can thus influence the production of neurotransmitters with concomitant
changes in post-neuron activity. Thus, in addition to modeling Ca++ or other triggers
as mechanisms to alter the maximum sodium and potassium conductance, we can also
model triggers that provide ways to increase or decrease neurotransmitter. Since the
neurotransmitter λ is a second messenger trigger, our full equations with multiplier
are
8.7 Neurotransmitter Effects 135
1 rζ [ζ(t)] − [ζ]b
hpζ ([λ(t)]) ζ
= γ (t) ζ 1 + tanh ζ
ξ0 2 gp
ζ
hpζ ([ζ(t)])[ζ̂]n −(x − x0 )2
I (t, x)) = √ exp
t − t0 4Dζ (t − t0 )
ζ
ζ ζ sζ I ζ (t, x) − r [2ζ̂]n
he (I (t, x)) = 1 + tanh ζ
2 ge
[Pζ ([ζ̂](t)] = heζ (hpζ ([ζ(t)]) [ζ̂]n )[ζ̂]n
ζ ζ eζ [Pζ (ζ̂(t))]
h ([P ([ζ̂](t)]) = δζ gζ max
1 + tanh
2 gζ
where x is the spatial variable for the post-dendritic cable. The final step is to interpret
what function the protein Pζ has in the cellular system. If it is a sodium or potassium
gate, the modification of the conductance of those ions is as before:
However, the protein could increase calcium current by adding more calcium gates.
We can model the calcium current alterations as
hζ ([Pζ ([ζ̂](t))])
−(x − xζs )2
δgζmax (t, x) = exp
t − tζs 4Dζs (t − tζs )
ζs
where the index ζs denotes the sites where the ζ gates are located. If the neurotrans-
mitter modifies the calcium currents, we have a different type of influence:
δ(Ca++ )max
ζ (t, x) = hζ ([Pζ ([ζ̂])(t)])
ζs
• a trigger T0 which enters the dendrite port PD and is a simple modulator of the volt-
age activated gates for sodium and potassium. So we directly modify the maximum
ion conductance.
• a trigger T0 which is a second messenger whose effects on sodium and ion con-
ductance utilize a sigmoid—3 transition.
• a Ca++ injection current which can alter the sodium and potassium conductance
functions via second messenger pathways.
• neurotransmitters λ0 to λ4 which can modify sodium and potassium conductance
via second messenger effects or directly. In addition, they can alter calcium currents
via second messenger pathways.
In addition, there are inputs at time t0 at various electronic distances along the
soma. We sum these effects over the soma electronic distance in the same way. These
values are then combined to give the voltage at the action hillock that could be used
in a general Hodgkin–Huxley model to generate the resulting action potential. In
summary, we can model top level computations using the cedllular graph given in
Fig. 8.8, with the details of the computational agents coming from our abstractions
of the relevant biology.
Reference
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd,
Singapore, 2015 in press)
Chapter 9
The Abstract Neuron Model
Let’s look at neuronal inputs and outputs more abstractly. We have learned quite a bit
about first and second messenger systems and how action potentials are generated
for input families. Now is the time to put all the to good use!
It is clear neuron classes can have different trigger characteristics. First, consider
the case of neurons which create the monoamine neurotransmitters. Neurons of this
type in the Reticular Formation of the midbrain produce a monoamine neurotrans-
mitter packet at the synaptic junction between the axon of the pre-neuron and the
dendrite of the post-neuron. The monoamine neurotransmitter is then released into
the synaptic cleft and it induces a second messenger response as shown in Fig. 6.8.
The strength of this response is dependent on the input from pre-neurons that form
synaptic connections with the post-neuron. In addition, the size of this input deter-
mines the strength of the monoamine trigger into the post-neuron dendrite. Let the
ζ
strength be given by the weighting term cpre,post where as usual we use the values
ζ = 0 to denote the neurotransmitter serotonin, ζ = 1, for dopamine and ζ = 2, for
norepinephrine. In our model, recall that time is discretized into integer values as
time ticks 1, 2 and so forth via our global simulation clock. Also, spatial values are
discretized into multiples of various electrotonic scaling distances. With that said,
the trigger at time t and dendrite location w is therefore
ζ
cpre,post −(w − w0 )2
T0 (t, w) = √ exp ζ
.
t − t0 4D0 (t − t0 )
ζ
where D0 is the diffusion constant associated with the trigger. Hence, w = jL̂E for
some scaling L̂E . The trigger T0 has associated with it the protein T1 . We let
ζ
dpre,post −(w − w0 )2
T1 (t, w) = √ exp ζ
.
t − t0 4D1 (t − t0 )
ζ ζ
where dpre,post denotes the strength of the induced T1 response and D1 is the diffusion
constant of the T1 protein. This trigger will act through the usual pathway. Also, we
let T2 denote the protein P(T1 ). T2 transcribes a protein target from the genome with
efficiency e.
r [T0 (t, w)] − [T0 ]b
hp ([T0 (t, w)]) = 1 + tanh
2 gp
hp ([T0 (t, w)])[T1 ]n −(w − w0 )2
I(t, w)) = √ exp
t − t0 4D1λ (t − t0 )
s I(t, w) − r[T21 ]n
he (I(t, w)) = 1 + tanh
2 ge
[P(T1 )](t, w) = he (I(t, w))
e [T2 ](t, w)
hT2 (t, w) = 1 + tanh
2 gT2
[T2 ](t, w) = hT2 (t, w)[T2 ]n
Note [T2 ](t, w) gives the value of the protein T2 concentration at some discrete time
t and spatial location jL̂E . This response can also be modulated by feedback. In this
f
case, let ξ denote the feedback level. Then, the final response is altered to hT2 where
the superscript f denotes the feedback response and the constant ω is the strength of
the feedback.
f 1
hT2 (t, w) = ω hT2 (t, w)
ξ
f
[T2 ](t, w) = hT2 (t, w)[T2 ]n
There are a large number of shaping parameters here. For example, for each neu-
rotransmitter, we could alter the parameters due to calcium trigger diffusion as dis-
ζ
cussed in Sect. 7.2. These would include D0 , the diffusion constant for the trigger,
ζ
and D1 , the diffusion constant for the gate induced protein T1 . In addition, transcribed
proteins could alter—we know their first order quantitative effects due to our earlier
ζ
analysis—dpre,post , the strength of the T1 response, r, the fraction of T1 free, gp , the
trigger gain, [T0 ]b , the trigger threshold concentration, s, the fraction of active T1
reaching genome, ge , the trigger gain for active T1 transition, [T1 ]n , the threshold for
T1 , [T2 ]n , the threshold for P(T1 ) = T2 , gT2 , the gain for T2 , ω, the feedback strength,
ζ λ
and ξ, the feedback amount for T1 = 1 − ξ. Note dpre,post could be simply cpre,post .
The neurotransmitter triggers can alter many parameters important to the creation
of the action potential. The maximum sodium and potassium conductances can be
9.1 Neuron Inputs 139
becomes
f 1
hT2 (t, w) = ω hT2 (t, w)
ξ
f
[T2 ](t, w) = hT2 (t, w)[T2 ]n
p q
gNa (t, w, V ) = ( gNa
max
+ [T2 ](t, w) )MNa (t, V )HNa (t, V )
[T2 ]n = δK gKmax
f 1
hT2 (t, w) = ω hT2 (t, w)
ξ
f
[T2 ](t, w) = hT2 (t, w)[T2 ]n
p q
gK (t, w, V ) = ( gNa
max
+ [T2 ](t, w) )MK (t, V )HK (t, V )
Finally, neurotransmitters and other second messenger triggers have delayed effects
in general. So if the trigger T0 binds with a port P at time t0 , the changes in protein
levels P(T1 ) need to delayed by a factor τ ζ . The soma calculations are handled
exactly like this except we are working on the soma cable and so electronic distance
is measured by the variable z instead of w.
We now understand in principle how to compute the axon hillock voltage that will be
passed to the Hodgkin–Huxley engine to calculate the resulting action potential. The
prime purpose of the incoming voltage is to provide the proper depolarization of the
excitable cell membrane. Hence, we know that we will generate an action potential
is the incoming signal exceeds the neuron’s threshold. For our purposes, the shape
of the action potential will be primarily determined by the alteration of the gNa
max
and
gK conductance parameters. In our model, these values are altered by the second
max
messenger triggers which create or destroy the potassium and sodium gates in the
membrane. Other triggers alter the essential hardware of the neuron and potentially
the entire neuron class N in other ways. The protein T2 due to a trigger u can
140 9 The Abstract Neuron Model
tanh(L)
tan(αL) = − (αL),
(ρ ± )L
Rather than performing such a numerical computation every time a global monoamine
modulation request is sent, it is easier to pre-compute the eigenvalue problems solu-
tion for a spectrum of ρ values. We then keep them in storage and access them as
needed to update the neuron class when required. The determination of the action
potential for a given axon hillock input is handled as follows. Once we know the
incoming voltage is past threshold, we generate an action potential whose parame-
ters are shaped by both the size of the pulse and the changes in ion conductances.
If we want to replace the action potential generation as the solution of a compli-
cated partial differential equation with an approximation of some kind, we need to
look at how the Hodgkin–Huxley equations are solved in more detail even though
we already introduced these equations in Peterson (2015). Our replacement for the
action potential will be called the Biological Feature Vector or BFV.
The individual neural objects in our cognitive model (and in particular, in our cor-
tical columns) will be abstractions of neural ensembles, their behaviors and outputs
gleaned from both low level biological processing and high level psychopharmacol-
ogy data. The low level information must include enough detail of how inputs are
processed (spike train generation) to be useful and enough detail of second mes-
senger pathways to see clearly that the interactions between a pre and a post neural
ensemble are really communications between their respective genomes. Clearly, this
implies that an appropriate abstract second messenger and genome model is needed.
In this section, we modify the general ball and stick model to use a very simple low
dimensional representation of the action potential.
9.3 Abstract Neuron Design 141
We can see the general structure of a typical action potential is illustrated in Fig. 9.1.
This wave form is idealized and we are interested in how much information can be
transferred from one abstract neuron to another using a low dimensional biologically
based feature vector Biological Feature Vector of BFV. We can achieve such an
abstraction by noting that in a typical excitable neuron response, Fig. 9.1, the action
potential exhibits a combination of cap-like shapes. We can use the following points
on this generic action potential to construct a low dimensional feature vector of
Eq. 9.1.
⎧ ⎫
⎪
⎪ (t0 , V0 ) start point ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ (t 1 , V 1 ) maximum point ⎪
⎪
⎨ ⎬
(t2 , V2 ) return to reference voltage
ζ= (9.1)
⎪
⎪ (t3 , V3 ) minimum point ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ (g, t 4 , V 4 ) sigmoid model of tail ⎪
⎪
⎩ ⎭
V3 + (V4 − V3 ) tanh(g(t − t3 ))
where the model of the tail of the action potential is of the form Vm (t)=V3 +
(V4 − V3 ) tanh(g(t − t3 )). Note that Vm (t3 ) = (V4 − V3 ) g and so if we were using
real voltage data, we would approximate Vm (t3 ) by a standard finite difference. This
wave form is idealized and actual measured action potentials will be altered by noise
and the extraneous transients endemic to the measurement process in the laboratory.
To study the efficacy of this BFV for capturing useful information, we performed
a series of computational experiments on the recognition of toxins introduced into
the input side of a cell from their effects on the cells action potential. We know
biotoxins alter the shape of this action potential in many ways. The toxin guide of
Adams and Swanson (1996) shows quite clearly the variety of ways that a toxin can
alter ion gate function in the cell membrane, second messenger cascades and so forth.
Older material from Kaul and Daftari (1986), Wu and Narahashi (1988) and Schiavo
et al. (2000) focus on the details of specific classes of toxins. Kaul and Wu focus on
pharmacological active substances from the sea. Schiavo investigates the effects of
toxins that interfere with the release of neurotransmitters by altering the exocytosis
process. The effects of toxins on the presynaptic side are analyzed in Harvey (1990),
Strichartz et al. (1987) presents a summary of how toxins act on sodium channels.
Let’s focus on a single action potential and examine how its characteristics change
when we introduce various alterations in the parameters that effect its shape. We know
that toxins cause many such effects and we can classify toxins into families based on
how they alter the parameters that can effect the action potential. A simulation is then
possible that collects information from a family of generated action potentials due
to the introduction of various toxin families. The BFV abstracts the characteristics
into a low dimensional vector and in Peterson and Khan (2006), we showed the
BFV was capable of determining which toxin family was used to alter the action
potential and our ability to discern the toxin family used compared nicely to other
methods of feature vector extraction such as the classic covariance technique, a total
variation method and an associated spline method that uses knots determined by the
total variation approach.
In order to demonstrate the efficacy of the BFV approach for capturing infor-
mation, we generated families of action potentials from a classic Hodgkin–Huxley
model. We also assume a toxin in a given family alters the action potential in a
specific way which we will call the toxin family signature. We studied two type
of signatures: first, families that alter the maximum sodium and potassium conduc-
tance parameters by a given percentage and second, families that perturb the standard
Hodgkin–Huxley α–β values.
Now, an input into an artificial neuron generates an abstract BFV response. Larger
ensembles of artificial neurons that can then be created to serve as modules or neural
objects in graphs.
The particular values on the components of a single artificial neuron’s or a
module’s feature vector, BFV, are then amenable to alteration via neurotransmitter
action such as serotonin, norepinephrine and dopamine coming from the midbrain
module of Fig. 5.3. The feature vector output of a neural object is thus due to the
cumulative effect of second messenger signaling to the genome of this object which
influences the action potential and thus feature vector of the object by altering its
complicated mixture of ligand and voltage activated ion gates, enzymes and so forth.
For example, the G protein-linked receptor superfamily second messenger system
would consist of a receptor with 7 transmembrane regions with links to G proteins that
uses a second messenger system activated by an enzyme—cAMP and PI and because
a second messenger system is used, response to a signal is delayed and hence, these
are slow response systems. Another family uses receptors with 4 transmembrane
regions. In this family, the ion channel is surrounded by multiple copies of multiple
different receptors and ion flow is directly controlled by a given particular mixture of
neurotransmitters and receptors. Clearly, there are many control possibilities that
9.3 Abstract Neuron Design 143
arise here due to the combinatorial nature of this family’s channels. It is difficult to
find appropriate high level descriptions of such events so that the algorithmic struc-
tures are evident and not obscured by the detail. Two important resources that have
been prime influences and have helped us develop generic models of neural objects
which have feature vector outputs which can be shaped by such pharmacological
inputs have been Stahl (2000, psychopharmacology) and Gerhart and Kirschner
(1997, evolution of cellular physiology). They have been out for awhile now, but
they are still packed with useful information for approximation schemes. The action
potential is generated by an input on the dendritic side of an excitable nerve cell.
We wish to analyze this signal and from its properties, discern whether or not a
toxin/ligand has been introduced into the input stream. further, we want to be able to
recognize the toxin as belonging to a certain family. Effectively, this means we can
label the output as due to a certain ligand input. For our purposes, we are interested in
a single action potential with general structure as illustrated in Fig. 9.1. We concen-
trate for the moment on single output voltage pulses produced by toxins introduced
into the input side of the nerve cell. The wave form in Fig. 9.1 is, of course, idealized
and the actual measured action potentials will be altered by noise and the extraneous
transients endemic to the measurement process in the laboratory.
In order to show that the abstract feature vector, BFV, is capable of capturing some
of the information carried by the action potential, we studied how to use this kind of
feature vector to determine whether or not an action potentials has been influenced
by a toxin introduced into the dendritic system of an excitable neuron. The role of the
toxin is of course similar to the role of the monoamine neurotransmitters we wish to
use to modulate cortical output. Recall the full information processing here is quite
complex as seen in Fig. 9.2 and we are approximating portions of it. We explain
the toxin studies in some detail in the sections that follow. This provides additional
background on the reasons for our choice of low dimensional feature vector. The
basic Hodgkin–Huxley model depends on a large number of parameters and we will
be using perturbations of these as a way to model families of toxins. Of course,
more sophisticated action potential models can be used, but the standard two ion
gate Hodgkin–Huxley model is sufficient for our needs in this paper. In Sect. 9.3.1,
we present the toxin recognition methodology for the first toxin family that modify
the maximal sodium and potassium conductances. We believe that toxins of this sort
include some second messenger effects. Our general classification methodology is
then discussed and applied to the this collection of toxin families. We show that
we can design a reasonable recognizer engine using a low dimensional biologically
based feature vector. In Sect. 9.3.1.2, we introduce the second class of toxin families
whose effect on the action potential is more subtle. We show that the biological
feature vector performs well in this case also.
Recall, when two cells interact via a synaptic interface, the electrical signal in the
pre-synaptic cell in some circumstances triggers a release of a neurotransmitter (NT)
144 9 The Abstract Neuron Model
Fig. 9.2 A simplified path of information processing in the brain. Arrows indicate information
processing pathways
9.3 Abstract Neuron Design 145
from the pre-synapse which crosses the synaptic cleft and then by docking to a port
on the post cell, initiates a post-synaptic cellular response. The general pre-synaptic
mechanism consists of several key elements: one, NT synthesis machinery so the NT
can be made locally; two, receptors for NT uptake and regulation; three, enzymes
that package the NT into vesicles in the pre-synapse membrane for delivery to the
cleft. On the post-synaptic side, we will focus on two general responses. The fast
response is triggered when the bound NT/Receptor complex initiates an immediate
change in ion flux through the gate thereby altering the electrical response of the
post cell membrane and hence, ultimately its action potential and spike train pattern.
The slow response occurs when the initiating NT triggers a second response in the
interior of the cell. In this case, the first NT is called the first messenger and the
intracellular response (quite complex in general) is the second messenger system.
Further expository details of first and second messenger systems can be found in
Stahl (2000). From the above discussion, we can infer that a toxin introduced to the
input system of an excitable cell influences the action potential produced by such a
cell in a variety of ways. For a classic Hodgkin Huxley model, there are a number of
critical parameters which influence the action potential.
The nominal values of the maximum sodium and potassium conductance G0Na
and G0K can be altered. We view these values as a measure of the density of a classic
Hodgkin Huxley voltage activated gates per square cm of biological membrane.
Hence, changes in this parameter require the production of new gates or the creation
of enzymes that control the creation/destruction balance of the gates. This parameter
is thus related to second messenger activity as access to the genome is required to
implement this change. We can also perturb the parameters that shape the α and β
functions which we introduced in Peterson (2015). These functions are all special
cases of the of the general mapping F(Vm , p, q) with p ∈ 4 and q ∈ 2 defined by
p0 (Vm + q0 ) + p1
F(Vm , p, q) = .
ep2 (Vm +q1 ) + p3
The p and q parameters control the shape of the action potential in a complex way.
From our discussions about the structure of ion gates, we could think of alterations in
the (p, q) pair associated with a given α and/or β as a way of modeling how passage
of ions through the gate are altered by the addition of various ligands. These effects
may or may not be immediate. For example, the alterations to the p and q parameters
may be due to the docking of ligands which are manufactured through calls to the
genome in the cell’s nucleus. In that case, there is a long delay between the initiation
of the second messenger signal to the genome and the migration of the ligands to
the outside of the cell membrane. In addition, proteins can be made which bind to
the inside of a gate and thereby alter the ion flow. We understand that this type of
modeling is not attempting to explain the details of such interactions. Instead, we are
exploring an approach for rapid identification and differentiation of the signals due
to various toxins.
Now assume that the standard Hodgkin Huxley model for gNa 0
= 120, gK0 = 36.0
and the classical α, β functions are labeled as the nominal values. Hence, there is a
nominal vector 0 given by
⎡ ⎤
G0Na = 120.0 G0K = 36.0
⎢ α 0 ⎥
⎢ (pm ) {−0.10, 0.0, −0.1, −1.0} (pβm )0 {0.0, 4.0, 0.0556, 0.0} ⎥
⎢ ⎥
⎢ α 0 ⎥
⎢ (p ) {0.0, 0.07, 0.05, 0.0}
β
(ph )0 {0.0, 1.0, −0.1, 1.0} ⎥
⎢ h ⎥
⎢ α 0 ⎥
0 = ⎢
⎢ (pn ) {−0.01, 0.0, −0.1, −1.0} (pβn )0 {0.0, 0.125, 0.0125, 0.0} ⎥
⎥
⎢ α 0 ⎥
⎢ (q ) {35.0, 35.0} (qmβ )0 {60.0, 60.0} ⎥
⎢ m ⎥
⎢ ⎥
⎢ (qα )0 {60.0, 60.0}
β
(qh )0 {30.0, 30.0} ⎥
⎣ h ⎦
(qnα )0 {50.0, 50.0} (qnβ )0 {60.0, 60.0}
A toxin G thus has an associated toxin signature, E(G) which consists of deviations
from the nominal classical Hodgkin Huxley parameter suite: E(G) = 0 + δ, where
δ is a vector, or percentage changes from nominal, that we assume the introduction
of the toxin initiates. As you can see, if we model the toxin signature in this way, we
have a rich set of possibilities we can use for parametric studies.
We begin with toxins whose signatures are quite simple as they only cause a change
in the nominal sodium and potassium maximum conductance. These are therefore
second messenger toxins. This simulation will generate five toxins whose signatures
are distinct. Using C++ (not MatLab!) as our code base, we designed a TOXIN class
whose constructor generates a family of distinct signatures using a signature as a
base. Here, we will generate 20 sample toxin signatures clustered around the given
signature using a neighborhood size of 0.02. First, we generate five toxins using the
toxin signatures of Table 9.1.
9.3 Abstract Neuron Design 147
These percentage changes are applied to the base conductance values to generate
the sodium, potassium and leakage conductances for each simulation. We use these
toxins to generate 100 simulation runs using the 20 members of each toxin family.
We believe that the data from this parametric study can be used to divide up action
potentials into disjoint classes each of which is generated by a given toxin. Each of
these simulation runs use the synaptic current injected as in Fig. 9.3 which injects
a modest amount of current over approximately 2 s. The current is injected at four
separate times to give the gradual rise seen in the picture. In Fig. 9.4, we see all
100 generated action potentials. You can clearly see that the different toxin families
create distinct action potentials.
The Biological Feature Vector Based Recognizer One can easily observe that for
a typical response, as in Fig. 9.1, the potential exhibits a combination of cap-like
shapes. We can use the following points on this generic action potential to construct
the low dimensional feature vector given earlier as Eq. 9.1 which we reproduce for
convenience.
⎡ ⎤
(t0 , V0 ) start point
⎢ (t1 , V1 ) maximum point ⎥
⎢ ⎥
ξ = ⎢ (t2 , V2 ) return to reference voltage ⎥
⎢
⎥,
⎣ (t3 , V3 ) minimum point ⎦
(g, t4 , V4 ) sigmoid model of tail
148 9 The Abstract Neuron Model
with the model of the tail of the action potential is of the form
Note that
We approximate Vm (t3 ) by a standard finite difference. We pick a data point (t5 , V5 )
that occurs after the minimum—typically we use the voltage value at the time t5 that
is 5 time steps downstream from the minimum and approximate the derivative at
t3 by
V5 − V3
Vm (t3 ) ≈
t5 − t3
V5 − V3
g=
(V4 − V3 )(t5 − t3 )
9.3 Abstract Neuron Design 149
which reflects the asymptotic nature of the hyperpolarization phase of the potential.
Note that ξ is in 11 . We see the rudimentary feature vector extraction we have called
the BFV is quite capable of generating a functional recognizer; i.e. distinguishing
between toxin inputs. Hence, we are confident we can use the BFV is develop-
ing approximations to nodal computations . In all of the recognizers constructed in
Peterson and Khan (2006), the classification of the toxin is determined by finding
the toxin class which is minimum distance from the sample.
Our earlier discussions focused on toxins that cause a change in the nominal sodium
and potassium maximum conductance, and so are effectively second messenger tox-
ins. However, the 36 additional α–β parameters that we have listed all can be altered
by toxins to profoundly affect the shape of the action potential curve. In this section,
we will focus on parameter changes that effect a small subset of the full range of
α–β possibilities.
First, we generate five toxins as shown below. Note that Toxin A perturbs the q
parameters of the α–β for the sodium activation m only; Toxin B does the same for
the q parameters of the sodium h inactivation; Toxin C alters the q parameters of
the potassium activation n; Toxin D changes the p[2] value of the α–β functions for
the sodium inactivation h; and Toxin E, does the same for the potassium activation
n. This is just a sample of what could be studied. These particular toxin signatures
were chosen because the differences in the generated action potentials for various
toxins from family A, B, C, D or E will be subtle. For example, in Toxin A, a
perturbation of the given type generates the α–β curves for mNA as shown in Fig. 9.5.
Note all we are doing is changing the voltage values of 35.0 slightly. This small
change introduces a significant ripple in the α curve. Note, we are only perturbing
two parameters at a time in a given toxin family. In all of these toxins, we will be
leaving the maximum ion conductances the same. We use these toxins to generate
100 simulation runs using the 20 members of each toxin family. We believe that
the data from this parametric study can be used to divide up action potentials into
disjoint classes each of which is generated by a given toxin. Each of these simulation
runs use the synaptic current injected as in Fig. 9.3 which injects a modest amount
of current over approximately 2 s. We generated five toxins whose signatures were
distinct. Again, using C++ as our code base, we designed a TOXIN class whose
constructor generates a family of distinct signatures using a given signature as a
base. Here, we generated twenty sample toxin signatures clustered around the given
signature using a neighborhood size for each of five toxin families. First, we generate
five toxin families. The types of perturbations each toxin family uses are listed in
Table 9.2. In this table, we only list the parameters that are altered. Note that Toxin
A perturbs the q parameters of the α–β for the sodium activation m only; Toxin B
does the same for the q parameters of the sodium h inactivation; Toxin C alters the q
parameters of the potassium activation n; Toxin D changes the p[2] value of the α–β
functions for the sodium inactivation h; and Toxin E, does the same for the potassium
activation n. This is just a sample of what could be studied. These particular toxin
signatures were chosen because the differences in the generated action potentials for
various toxins from family A, B, C, D or E will be subtle. For example, in Toxin
A, a perturbation of the given type generates the α–β curves for mNA as shown in
Fig. 9.5. Note all we are doing is changing the voltage values of 35.0 slightly. This
small change introduces a significant ripple in the α curve. We use these toxins to
generate 100 simulation runs using the 20 members of each toxin family and the
BFV approach can easily distinguish between these families.
Each of these simulation runs use the synaptic current injection protocol as
described before. The current is injected at four separate times to give the grad-
ual rise seen in the picture. In Fig. 9.6, we see all 100 generated action potentials.
You can clearly see that the different toxin families create distinct action potentials
and as shown in Peterson and Khan (2006), the BFV is very capable at as a functional
recognizer which determines the toxin family that has perturbed the dendritic inputs.
9.4 Feature Vector Abstraction 151
Fig. 9.6 Generated voltage traces for the α–β toxin families
In the work of this chapter, we have indicated a low dimensional feature vector
based on biologically relevant information extracted from the action potential of an
excitable nerve cell is capable of subserving biological information processing. We
can use such a BFV to extract information about how a given toxin influences the
shape of the output pulse of an excitable neuron. Hence, we expect modulatory inputs
into our abstract neuron can be modeled as alterations to the components of the BFV.
The biological feature vector stores many of the important features of the action
potential is a low dimensional form. We note these include
• The interval [t0 , t1 ] is the duration of the rise phase. This interval can be altered
or modulated by neurotransmitter activity on the nerve cell’s membrane as well as
second messenger signaling from within the cell.
• The height of the pulse, V1 , is an important indicator of excitation.
• The time interval between the highest activation level, V1 and the lowest, V3 , is
closely related to spiking interval. This time interval, [t1 , t3 ], is also amenable to
alteration via neurotransmitter input.
• The “height” of the depolarizing pulse, V4 , helps determine how long it takes for
the neuron to reestablish its reference voltage, V0 .
152 9 The Abstract Neuron Model
• The neuron voltage takes time to reach reference voltage after a spike. This is the
time interval by the interval [t3 , ∞].
• The exponential rate of increase in the time interval [t3 , ∞] is also very important
to the regaining of nominal neuron electrophysiological characteristics.
Clearly, we can model an inhibitory pulse in essentially the same way, mutatis
mutandi. We will assume all of the data points in our feature vector are potentially
mutable due to neurotransmitter activity, input pulses into the neuron’s dendritic
system and alteration of the neuron hardware via genome access with the second
messenger system.
Although it is possible to do detailed modeling of biological systems using
GENESIS and NEURON, we do not believe that they are useful tools in modeling
the kind of information flow between cortical modules that is needed for a cognitive
model. Progress in building large scale models that involve many cooperating neu-
rons will certainly involve making suitable abstractions in the information processing
that we see in the neuron. Neurons transduce and integrate information on the den-
dritic side into wave form pulses and there are many models involving filtering and
transforms which attempt to “see” into the action potential and find its informational
core so to speak. However, all of these methods are hugely computationally expensive
and even a simple cognitive model will require ensembles of neurons acting together
locally to create global effects. For reasons outlined above, we believe alterations in
the parameters of the simple biological feature vector (BFV) can serve as modulatory
agents in ensembles of abstract neurons. The kinds of changes one should use for
a given neurotransmitters modulatory effect can be estimated from the biophysical
and toxin literature. For example, an increase in sodium ion flow, Ca++ gated second
messenger activity can be handled at a high level as a suitable change in one of the
11 parameters of the BFV.
In Fig. 9.7, we indicated the three major portions of the biological feature vector and
the particular data points chosen from the action potential which are used for the
model. These are the two parabolas f1 and f2 and the sigmoid f3 . The parabola f1 is
treated as the two distinct pieces f11 and f12 given by
Thus, f1 consists of two joined parabolas which both have a vertex at t1 . The functional
form for f2 is a parabola with vertex at t3 :
f2 (t) = a2 + b2 (t − t3 )2 (9.4)
9.4 Feature Vector Abstraction 153
We have also simplified the BFV even further by dropping the explicit time point t4
and modeling the portion of the action potential after the minimum voltage by the
sigmoid f3 . From the data, it follows that
This implies
a11 = V1
V0 − V1
b11 =
(t0 − t1 )2
a = V1
12
V2 − V1
b12 =
(t2 − t1 )2
f2 (t2 ) = V2 = a2 + b2 (t2 − t3 )2
f2 (t3 ) = V3 = a2
We conclude that
a2 = V3
V2 − V3
b2 =
(t2 − t3 )2
Hence, the functional form of the BFV model can be given by the mapping f of
Eq. 9.6.
⎧ −V1
⎪
⎪ V1 + (tV00−t1)
2 (t − t1 ) ,
2
t 0 ≤ t ≤ t1
⎪
⎪
⎪ V 2 −V
⎨ V1 + (t −t )2 (t − t1 ) ,
1 2
t 1 ≤ t ≤ t2
f (t) =
2 1
−V (9.6)
⎪
⎪ V3 + (t22−t3 )32 (t − t3 )2 ,
V
t 2 ≤ t ≤ t3
⎪
⎪
⎪
⎩ V4 + (V4 − V3 ) tanh(g(t − t − 3)), t3 ≤ t < ∞
1
p(t) = ± (t − α)
4β
where 4β is the width of the line segment through the focus of the parabola. The
models f11 and f12 point down and so use the “minus” sign while f2 uses the “plus”.
By comparing our model equations with this generic parabolic equation, we find the
width of the parabolas of f11 , f12 and f2 is given by
(t0 − t1 )2 −1
4β11 = = 11
V1 − V0 b
(t2 − t1 )2 −1
4β12 = = 12
V1 − V2 b
(t2 − t3 ) 2
1
4β2 = = 2
V2 − V3 b
9.4 Feature Vector Abstraction 155
We want to modulate the output of our abstract neuron model by altering the BFV.
The BFV itself consists of 10 parameters, but better insight, into how alterations of
the BFV introduce changes in the “action potential” we are creating, comes from
studying changes in the mapping f given in Sect. 9.4.1. In addition to changes in
timing, t0 , t1 , t2 and t3 , we can also consider the variations of Eq. 9.7.
⎡ ⎤ ⎡ ⎤
⎡ V1
⎤ Maximum
Voltage
a 11
⎢ V0 −V1 ⎥ ⎢ −1 ⎥
⎢ b11 ⎥ ⎢ (t0 −t1 )2 ⎥ ⎢ 4β11 ⎥
⎢ 12 ⎥ ⎢ ⎢
⎥ ⎢
⎥ ⎢ Maximum Voltage
⎥
⎥
⎢ a ⎥ ⎢ V1 ⎥ ⎢ ⎥
⎢ 12 ⎥ = ⎢ ⎥=⎢ ⎥ (9.7)
⎢ b ⎥ ⎢ V 2 −V 1
⎥ ⎢ −1
⎥
⎢ 2 ⎥ ⎢ (t2 −t1 )2 ⎥ ⎢ 4β12 ⎥
⎣ a ⎦ ⎢ V ⎥ ⎢ Minimum Voltage ⎥
b2 ⎣ 3 ⎦ ⎣ ⎦
−V3
(tV22−t 3)
2 4β1 2
It is clear that modulatory inputs that alter the cap shape and hyperpolarization curve
of the BFV functional form can have a profound effect on the information contained
in the “action potential”. For example, a hypothetical neurotransmitter that alters V1
will also alter the latis rectum distance across the cap f1 . Further, direct modifications
to the latis rectum distance in any of the two caps f11 and f12 can induce corresponding
changes in times t0 , t1 and t2 and voltages V0 , V1 and V2 . A similar statement can
be made for changes in the latis rectum of cap f2 . For example, if a neurotransmitter
−V0
induced a change of, say 1 % in 4β11 , this would imply that ( (tV01−t 1)
2 ) = .04β11
0
where β11 0
denotes the original value of β11
0
. Thus, to first order
∗ ∗ ∗ ∗
∂β11 ∂β11 ∂β11 ∂β11
.04β11
0
= V0 + V1 + t0 + t1
∂V0 ∂V1 ∂t0 ∂t1
(9.8)
where the superscript ∗ on the partials indicates they are evaluated at the base point
(V0 , V1 , t0 , t1 ). Taking partials we find
∂β11 (t0 − t1 )2 2
= 2 = β0
∂V0 (V1 − V0 )2 V1 − V0 11
∂β11 (t0 − t1 )2 2
= −2 =− β0
∂V1 (V1 − V0 )2 V1 − V0 11
∂β11 t 0 − t1 2
= 2 = β0
∂t0 V1 − V0 t0 − t1 11
∂β11 t0 − t1 2
= −2 =− β0
∂t1 V1 − V0 t0 − t1 11
156 9 The Abstract Neuron Model
2V0 0 2V1 0 1 1
.04β11
0
= β − β + 2t0 β 0 − 2t1 β0
V1 − V0 11 V1 − V0 11 t0 − t1 11 t0 − t1 11
This simplifies to
Similar equations can be derived for the other two width parameters for caps f12 and
f3 . These sorts of equations give us design principles for complex neurotransmitter
modulations of a BFV.
The BFV model we build consists of a dendritic system and a computational core
which processed BFV input sequence to generate a BFV output.
We know that
dVM
Cm = IE − gKmax (MK )4 (Vm , t)(Vm − EK )
dt
− gNa
max
(MNA )3 (Vm , t)(HNA )(Vm , t)(Vm − ENa ) − gL (Vm − EL ).
Since the BFV is structured so that the action potential has a maximum at t1 of value
V1 and a minimum at t3 of value V3 , we have Vm (t1 ) = 0 and Vm (t3 ) = 0. This gives
Fig. 9.8 The product MNA 3 HNA : sodium conductances during a pulse 1100 at 3.0
Fig. 9.9 The product MK 4 : potassium conductances during a pulse 1100 at 3.0
From Figs. 9.8 and 9.9, we see that m3 (V1 , t1 )h(V1 , t1 ) ≈ 0.35 and n4 (V1 , t1 ) ≈ 0.2.
Further, m3 (V3 , t3 )h(V3 , t3 ) ≈ 0.01 and n4 (V3 , t3 ) ≈ 0.4.
Thus,
Reorganizing,
max + 0.35g max + g V − 0.20g max E + 0.35g max E
IE (t1 ) = 0.20gK Na L 1 K K Na Na + gL EL
max + 0.01g max + g V − 0.40g max E + 0.01g max E
IE (t3 ) = 0.40gK Na L 3 K K Na Na + gL EL
Thus,
∂V1 1
= 0.20EK
∂gKmax 0.20gKmax + 0.35gNa
max
+ gL
IE (t1 ) + 0.20gK EK + 0.35gNa
max max
ENa + gL EL
+
0.20gK + 0.35gNa + gL
max max
−1.0
.20
0.20gKmax + 0.35gNa max
+ gL
This simplifies to
∂V1 0.20
max = (EK − V1 ) (9.9)
∂gK 0.20gK + 0.35gNa
max max
+ gL
Similarly, we find
∂V1 0.35
max = (ENa − V1 ) (9.10)
∂gNa 0.20gKmax + 0.35gNa
max
+ gL
∂V3 0.40
= (EK − V3 ) (9.11)
∂gKmax 0.40gKmax + 0.01gNa
max
+ gL
∂V3 0.40
max = (ENa − V3 ) (9.12)
∂gNa 0.40gKmax + 0.01gNa
max
+ gL
We also know that as t goes to infinity, the action potential flattens and Vm approaches
0. Also, the applied current, IE is zero and so we must have
We have V∞ is V4 . Thus,
This gives, letting (MNA )3 (V4 , ∞)(HNA )(V4 , ∞) and (MK )4 (V4 , ∞) be denoted by
((MNA )3 (HNA ))∗ and ((MK )4 )∗ for simplicity of exposition,
gKmax ((MK )4 )∗ + gNa
max
((MNA )3 (HNA ))∗ + gL V4
= gKmax ((MK )4 )∗ EK + gNamax
((MNA )3 (HNA ))∗ ENa + gL EL
Hence, letting ((MNA )3 (HNA ))∗ ≡ (m3 h)∗ and ((MK )4 )∗ ≡ (n4 )∗ , we have
We see
∂V4 n4 (V4 , ∞)
= EK − V4 (9.13)
∂Gmax
K
max n4 (V , ∞) + g max m3 (V , ∞)h(V , ∞) + g
gK 4 Na 4 4 L
∂V4 m3 (V4 , ∞)h(V4 , ∞)
max = max 4 ENa − V4 (9.14)
∂GNa max m3 (V , ∞)h(V , ∞) + g
gK n (V4 , ∞) + gNa 4 4 L
We can also assume that the area under the action potential curve from the point
(t0 , V0 ) to (t1 , V1 ) is proportional to the incoming current applied. If VIn is the
axon-hillock voltage, the impulse current applied to the axon-hillock is gIn VIn where
gIn is the ball stick model conductance for the soma. Thus, the approximate area
under the action potential curve must match this applied current. We have
1
(t1 − t0 ) (V1 − V0 ) ≈ gIn VIn
2
We conclude
2gIn VIn
(t1 − t0 ) =
V1 − V0
160 9 The Abstract Neuron Model
Thus
∂(t1 − t0 ) t 1 − t0 ∂V1
=− (9.15)
∂gKmax
V1 − V0 ∂gKmax
∂(t1 − t0 ) t 1 − t0 ∂V1
=− (9.16)
∂gNa
max
V1 − V0 ∂gNa
max
Also, we know that during the hyperpolarization phase, the sodium current is off and
the potassium current is slowly bringing the membrane potential back to the reference
voltage. Now, our BFV model does not assume that the membrane potential returns
to the reference level. Instead, by using
Y (t) = V3 + (V4 − V3 ) tanh g(t − t3 )
1 e2u − 1
= 2u
2 e +1
ln(3)
t ∗ = t3 +
2g
dVM
Cm = −gKmax (MK )4 (Vm , t)(Vm − EK ) − gL (Vm − EL ).
dt
Simplifying, we have
9g g max gL 1 g max gL
(V4 − V3 ) = 0.01 K EK + EL − 0.01 K + (V4 + V3 )
128 Cm Cm 2 Cm Cm
9g g max gL 1 g max gL V4 + V3
= 0.01 K EK + EL − 0.01 K +
64 Cm Cm V4 − V3 Cm Cm V4 − V3
We can see clearly from the above equation, that the dependence of g on gKmax and
gNa
max
is quite complicated. However, we can estimate this dependence as follows. We
know that V3 + V4 is about the reference voltage, −65.9 mV. If we approximate V3
by the potassium battery voltage, Ek = −72.7 mV and V4 by the reference voltage,
we find VV34 +V4
−V3
≈ −138.6
6.8
= −20.38 and V4 −V
1
3
≈ 6.8
1
= 0.147. Hence,
9Cm g
= 0.147 0.01gKmax EK + gL EL + 20.38 0.01gKmax + gL
64
= 0.0147EK + 2.038EL gKmax + gL 0.0147EL + 20.38
Thus, we find
∂g 64
= 0.0147EK + 2.038EL (9.17)
∂gKmax 9Cm
This gives ∂g∂gmax ≈ −710.1 Eq. 9.17 shows what our intuition tells us: if gKmax
K
increases, the potassium current is stronger and the hyperpolarization phase is short-
ened. On the other hand, if gKmax decreases, the potassium current is weaker and the
hyperpolarization phase is lengthened.
Consider a typical input V (t) which is determined by a BFV vector. Without loss of
generality, we will focus on excitatory inputs in our discussions. The input consists
162 9 The Abstract Neuron Model
Fig. 9.10 Two action potential inputs into the dendrite subsystem
of a three distinct portions. First, a parabolic cap above the equilibrium potential
determined by the values (t0 , V0 ), (t1 , V1 ), (t2 , V2 ). Next, the input contains half of
another parabolic cap dropping below the equilibrium potential determined by the
values (t2 , V2 ) and (t3 , V3 ). Finally, there is the hyperpolarization phase having func-
tional form H(t) = V3 + (V4 − V3 ) tanh(g(t − t3 )). Now assume two inputs arrive
at the same electronic distance L. We label this inputs as A and B as is shown in
Fig. 9.10. For convenience of exposition, we also assume t3A < t3B , as otherwise, we
just reverse the roles of the variables in our arguments. In this figure, we note only
the minimum points on the A and B curves. We merge these inputs into a new input
V N prior to the hyperpolarization phase as follows:
9.4 Feature Vector Abstraction 163
t0A + t0B
t0N =
2
V0A + V0B
V0N =
2
t1A + t1B
t1N =
2
V1A + V1B
V1N =
2
t2A + t2B
t2N =
2
V2A + V2B
V2N =
2
This constructs the two parabolic caps of the new resultant input by averaging the
caps of V A and V B . The construction of the new hyperpolarization phase is more
complicated. The shape of this portion of an action potential has a profound effect
on neural modulation, so it is very important to merge the two inputs in a reasonable
way. The hyperpolarization phases of V A and V B are given by
H (t) = A
+ − V3A (V4A V3A )
tanh g (t − t3 )
A A
H (t) = V3 + (V4 − V3 ) tanh g (t − t3 )
B B B B B B
∂E ∂E ∂E ∂E
For optimality, we find the parameters where , ,
∂V3 ∂V4 ∂g
and ∂t3
are 0. Now,
∞
∂E ∂H
= 2 H(t) − H (t) + H(t) − H (t)
A B
dt
∂V3 t3A ∂V3
Further,
∂H
= 1 − tanh g(t − t3 )
∂V3
so we obtain
∞
0= 2 H(t) − H A (t) + H(t) − H B (t) 1 − tanh g(t − t3 ) dt (9.18)
t3A
164 9 The Abstract Neuron Model
We also find
∞
∂E
= 2 H(t) − H A (t) + H(t) − H B (t) tanh g(t − t3 ) dt
∂V4 t3A
as
∂H
= tanh g(t − t3 )
∂V4
We calculate
∂H
= (V4 − V3 )(t − t3 ) sech g(t − t3 )
2
∂g
∂H
= −(V4 − V3 ) g sech g(t − t3 )
2
∂t3
9.4 Feature Vector Abstraction 165
Thus, we find
∞
0= H(t) − H A (t) + H(t) − H B (t) (V4 − V3 )(t − t3 ) sech2 g(t − t3 ) dt
t3A
∞
0= H(t) − H A (t) + H(t) − H B (t) (V4 − V3 ) g sech2 g(t − t3 ) dt.
t3A
This implies
∞
0= H(t) − H A (t) + H(t) − H B (t) t sech2 g(t − t3 ) dt
t3A
∞
− t3 H(t) − H (t) + H(t) − H (t)
A B
sech g(t − t3 ) dt
2
tA
∞
3
0= H(t) − H A (t) + H(t) − H B (t) sech2 g(t − t3 ) dt.
t3A
V3A + V3B
0 = V3 −
2
V4A + V4B
0 = V4 −
2
166 9 The Abstract Neuron Model
V3A + V3B
V3 = (9.25)
2
V + V4B
A
V4 = 4 (9.26)
2
tanh(g(t3A − t3 )) =
(V4 − V3 )
w34
A
tanh g (t3 − t3 )
A B A
tanh(g(t3B − t3 )) =
(V4 − V3 )
Defining
w34
B
tanh g B
(t3A − t3B )
zA =
V4 − V3
w34 tanh g (t3 − t3 )
A A B A
zB =
V4 − V3
9.4 Feature Vector Abstraction 167
we find that the optimality conditions have led to the two nonlinear equations for g
and t3 given by
tanh g(t3 − t3 ) = zA
A
(9.29)
tanh g(t3B − t3 ) = zB (9.30)
zB = =
V4 − V3 w34
A
+ w34
B
Hence,
w34
B
zA > − > −1
w34
A
+ w34
B
w34
A
zB < <1
w34
A
+ w34
B
so that zA < 0 < zB . It seems reasonable that the optimal value of t3 should lie
between t3A and t3B . Note Eqs. 9.29 and 9.30 preclude the solutions t3 = t3A or t3 = t3B .
To solve the nonlinear system for g and t3 , we will approximate tanh by its first order
Taylor Series expansion. This seems reasonable as we don’t expect g(t3A − t3 ) and
g(t3A − t3 ) to be far from 0. This gives the approximate system
g(t3A − t3 ) ≈ zA (9.31)
g(t3B − t3 ) ≈ z B (9.32)
t3A − t3 zA
=
t3 − t3
B zB
(t3A − t3 ) zB = (t3B − t3 ) zA
t3A zB − t3B zA = t3 (zB − zA )
t3A zB − t3B zA
t3 = (9.33)
zB − zA
Using the approximate value of t3 , we find the optimal value of g can be approximated
as follows:
zB
g=
t3A zB −t3B zA
t3B − zB −zA
zB (zB − zA )
=
t3B (zB
− zA ) − (t3A zB − t3B zA )
zB (zB − zA )
= B
t3 zB − t3A zB
zB − zA
= B
t3 − t3A
z B − zA
g= (9.34)
t3B − t3A
It is easy to check that this value of t3 lies in (t3A , t3B ) as we suspected it should and
that g is positive. We summarize our results. Given two input BFVs, the sigmoid
portions of the incoming BFVs combine into the new sigmoid given by
H(t) = V3 + (V4 − V3 ) tanh g(t − t3 )
A
V3A + V3B V4 − V3A V B − V3A zB − zA t3A zB − t3B zA
H(t) = + + 4 tanh B t −
2 2 2 t3 − t3A zB − zA
Given an input sequence of BFV’s into a port on the dendrite of an accepting neuron
{Vn , Vn−1 , . . . , V1 }
9.4 Feature Vector Abstraction 169
the procedure discussed above computes the combined response that enters that port
at a particular time. The inputs into the dendritic system are combined pairwise; V2
and V1 combine into a Vnew which then combines with V3 and so on. We can do this
at each electrotonic location.
1
A= (V2 − V0 )(t2 − t0 )
2
The pre-neuron signals that come into the dendritic cable of the post-neuron are the
initial conditions that determined the voltage at the axon hillock of the post-neuron.
We are modeling the dendritic arbor and the soma as finite length cables. At position
λ on the cable, the voltage is given by
∞
v̂m (λ, τ ) = A0 e−τ + An cos(αn (4 − λ))e−(1+αn )τ
2
n=1
as discussed in Peterson (2015). This voltage arrives at the far end, z = 7 in Fig. 9.12,
of the soma cable and the voltage we have at z = 0 is then the axon hillock volt-
age. This figure shows a general computational architecture for an artificial neuron
consisting of
• a dendrite system of electronic length 4. Four dendrite ports are shown labeled as
Di . Each accepts ISP and ESP inputs which are generated and modulated using
the mechanisms we have discussed.
• a soma of length 7 which contains 7 pairs simple and second messenger ports
(Gi and SMi , respectively). The simple ports Gi are where sodium and potassium
maximum conductance can be modified by direct means. The second messenger
ports SMi accept trigger T0 and generate proteins which alter the functionality of
the cell by accessing the genome.
• Three nuclear membrane gate blocks, NGi where second messenger proteins T1
can dock to initiate the transcription of a protein used in altering cell function.
9.5 The Full Abstract Neuron Model 171
• The protein block is shown differentiating into the gate proteins that are sent to the
sites Gi . Although not shown, there could also be proteins that alter Ca++ injection
currents
The axon hillock voltage is then input into the Hodgkin–Huxley equations to generate
an action potential. The voltage at z = 0 would have the form
∞
v̂m (0, τ ) = B0 e−τ + Bn cos(7βn )e−(1+βn )τ
2
n=1
References
We begin by looking at some key assumptions made in typical high level or top-down
emotional modeling. A representative of this approach is found in the work of
Sloman (1997a, b, 1998, 1999a, b) which can be summarized by the following two
statements. First, there are information processing architectures existing in the human
brain that mediate internal and external behavior. Secondly, there is a correspondence
between the high level functionality obtained from artificial and explicitly designed
architectures and naturally evolved architectures despite low level implementation
differences. The first assumption is part of many of today’s cognitive science pro-
grams while the second assumption is more problematic. One of the reasons we are
developing a software model of cognition is that we feel it will be helpful in both
elucidating our understanding of a software cognitive process and in validating this
second assumption.
It is clear this desired correspondence will depend on the right kind of abstrac-
tion of messy hardware detail. For example, we could look at an abstract wiring
diagram for a portion of the cerebral cortex (highly stylized, of course) as discussed
in chapter neural-structure and see any attempt to implement high level outlines
of software architectures that might subserve cognition or emotional modeling will
be inherently highly interconnected. It is also clear that the concept of hierarchical
organization is not very rigid in wetware as we have discussed at length in Chap. 14.
Thus implementing these architectures in software will require the ability to oper-
ate in asynchronous ways using modules whose computations are not parallelizable
across multiple nodes.
Instead, these computational objects communicate outputs and assimilate inputs
in some sort of a collective fashion. This has led us to propose the use of asynchronous
software tools as a means of coordinating many different such objects operating in
an asynchronous fashion on a heterogeneous computer network. The broad plan
of an approach to this kind of emotional modeling is given in Fig. 10.1: We can
then design characters to move autonomously in a three dimensional virtual world
and generate a real-time model of its emotions from both interactions with both its
tools to allow us to efficiently code these interactions. We show some simple full brain
architectures in Chap. 20 developed with these tools to give a taste of what we can do
with these tools and we show how all of the discussion in this text culminates in an
approach to a cognitive dysfunction model in Chap. 21. However, for the moment,
we think it is important to look at the broad outlines of these other approaches.
Consider the following software architecture inspired by Sloman’s computational
emotional models. A simplistic towered approach divides cognition into three sep-
arate areas as shown in Fig. 10.2: This model separates cognition into the three
areas: perception, central processing mechanisms and action methods. We don’t try
to understand how these blocks can be implemented at this point. This diagram
comes from the work of Nilsson (2001). Sloman then introduced the simple layered
model (Fig. 10.3) which is organized around reasoning concepts. A hybrid model
178 10 Emotional Models
can then be introduced that essentially uses both the towers and layers together to
give us more interacting functional subgroups. In Fig. 10.4, the indicated feedback
and feedforward paths help us to visualize how the many functional subroutines will
be combined. Finally, a full meta-management layer is added as shown in Fig. 10.5.
The meta model gives a simplistic summary of adult human information process-
ing in a modular design. Many researchers believe that such modular designs are
essential for defeating the combinatorial explosion that arises in the search for solu-
tions to complex problems. Hence, in the Sloman hybrid model, human information
processing uses three distinctly different but simultaneously active architectural lay-
ers: Reactive, Deliberative and Meta management; plus support modules for Motive
Generation, Global Alarm Mechanisms and Long Term Associative Memory Stor-
age. All of these layers can be implemented using asynchronously interacting agents.
Note how far we are from the underlying neurobiology and the approaches which
use networks of computational nodes such as neurons.
10.1 The Sloman Emotional Model 179
Sloman then classifies emotions in the following way. The Reactive Layer and the
Global Alarm process provide outputs that can be classified as Primary Emotions.
These would be reactions such as being startled, being frozen with terror and
being sexually aroused. The Deliberative Layer provides Secondary Emotions.
This include things such as apprehension and relief. Sloman argues that such
responses, require a “what-if” reasoning ability. Finally, the Meta Management Layer
allows for Tertiary Emotions. These include complicated and complex responses
such as control of thought and emotion, loss of emotion, infatuations and humiliations
and thrilled anticipation. These are crucial to the absorption of culture, mathematics
and so forth. Now the crucial point is that all layers and the alarm system oper-
ate concurrently (i.e. asynchronously) and none is in total control. Sloman offers a
compelling example: we can think of longing as a tertiary emotion. Longing for
your mother requires at least that you know you have a mother, you know she is not
present, you understand the possibility of being with her and you find her absence
180 10 Emotional Models
unpleasant. All of these things require that you possess and manipulate information.
However, these conditions are not sufficient as these conditions could lead you to
regret her absence but not long for her. Note we can consider regret as an attitude
and longing as an emotion. So we clearly need more: one possibility is that longing
has the additional quality that you can’t easily put thoughts of her out of your mind.
This indicates a partial loss of control of attention which is an extraordinarily inter-
esting way to view this phenomenon of longing. This implies that to be able to lose
control of this attention means that we can sometimes control it also. There must
be some information processing mechanism which can control which information
is processed but which is not always in control. Partly losing control of thought
processes is a perturbation of our normal state.
We can infer that one way to think about tertiary emotions is that they are
perturbant states in our system. Other examples are extreme grief (you can’t think
of anything else) and extreme infatuation (your thoughts are always drawn to that
person). Our task then is to make sure that our interacting society of computational
agents can exhibit these sorts of behaviors.
We could then build our emotional models using interacting collections of com-
putational agents on a heterogeneous computer network. Some nodes of the cluster
compute emotional attributes and some nodes will drive the movement and expres-
sions of a three dimensional character or avatar that moves autonomously in a virtual
world. Silas the dog has already been implemented in this fashion by Blumberg at
the MIT Media School as early as 1997 (Blumberg 1997).
However, this model is not neurobiologically based and as such is not suitable for
our needs. We want to add biological plausibility to our models.
In a sequence of seminal papers, (Lang et al. 1998; Codispotti et al. 2001; Bradley
and Lang 2000) and (Cuthbert et al. 1996), it has been shown that people respond to
emotionally tagged or affective images in a semi-quantitative manner. Human volun-
teers were shown various images and their physiological responses were recorded in
two ways. One was a skin galvanic response and the other a fMRI parameter. Typical
results are plotted in Fig. 10.6. In this database of images, extreme images always
generated a large plus or minus response while neutral images such as those of an
infant generated null or near origin results.
If we followed the original intent and spirit of the Affective Image research, we
would like to develop data for each of nine primary emotional states as indicated
in Fig. 10.6. These would be emotional states that correspond to the following nine
locations on the two dimensional grid:
10.2 PsychoPhysiological Data 181
Clearly, the emotional tags associated with the images in the affective image
database are not cleanly separated into primary emotions such as anger, sadness
and happiness. However, we can infer that the center (Null, Null) state is associated
with images that have no emotional tag. Also, the images do cleanly map to distinct
2D locations on the grid when the emotional contents of the images differ. Hence,
we will assume that if a database of images separated into states of anger, sadness,
happiness and neutrality were presented to human subjects, we would see a similar
separation of response. Our hypothetical response would be captured in the emotion
triangle seen in Fig. 10.7. Indeed, in both the musical and painting compositional
domain, we will therefore design special matrices called Würfelspiel matrices for
the four positions marked in Figs. 10.6 and 10.7. Such matrices were used in the 18th
century so that fragments of music could be rapidly prototyped by using a matrix
of possibilities. We will develop such matrices for music and painting in Chaps. 11
and 12, respectively. Note, we can also use other types of emotional labels and use a
similar triangle to design other sorts of symbolic mapping data. We will be using the
emotionally labeled music and painting data to train cortical models in later chapters.
182 10 Emotional Models
References
B. Blumberg, Old Tricks, New Dogs: Ethology and Interactive Creatures, PhD thesis, MIT, The
Program in Media Arts and Sciences, School of Architecture and Planning (1997)
M. Bradley, P. Lang, Affective reactions to acoustic stimuli. Psychophysiology 37, 204–215 (2000)
T. Ono, G. Matsumoto, R. Llinás, A. Berthoz, R. Norgren, H. Nishijo, R. Tamura (eds.), Cognition
and Emotion in the Brain (Elsevier, Amsterdam, 2003)
M. Codispotti, M. Bradley, P. Lang, Affective reactions to briefly presented pictures. Psychophysi-
ology 38, 474–478 (2001)
B. Cuthbert, M. Bradley, P. Lang, Probing picture perception: activation and emotion. Psychophys-
iology 33, 103–111 (1996)
A. Damasio, G. Carvalho, The nature of feelings: evolutionary and neurobiological origins. Nat.
Rev.: Neurosci. 14, 143–152 (2013)
P. Lang, M. Bradley, J. Fitzimmons, B. Cuthbert, J. Scott, B. Moulder, V. Nangia, Emotional arousal
and activation of the visual cortex: an fMRI analysis. Psychophysiology 35, 199–210 (1998)
N. Nilsson, Teleo-Reactive Programs and the Triple-Tower Architecture. Technical Report (Robotics
Laboratory, Department of Computer Science, Stanford University, Stanford, 2001)
E. Rolls, Emotion Explained (Oxford University Press, Oxford, 2005)
K. Scherer, Emotions are emergent processes: they require a dynamic computational architecture.
Philos. Trans. R. Soc. B 364, 3459–3474 (2009)
A. Sloman, Designing human-like minds (1997a), http://www.cs.bham.ac.uk/research/cogaf. Pre-
sented at the European Conference on Artificial Life
A. Sloman, What sort of control system is able to have a personality, in Creating Personalities for
Synthetic Actors: Towards Autonomous Personality Agents, ed. by R. Trappl, P. Petta (Springer,
Heidelberg, 1997b), pp. 166–208
A. Sloman, Architectures and tools for human-like agents (1998). Presented at the European Con-
ference on Cognitive Modeling
A. Sloman, Architectural Requirements for Human-like Agents Both Natural and Artificial (What
Sorts of Machines Can Love?), in Human Cognition and Social Agent Technology, vol. 19,
Advances in Consciousness Research Series, ed. by K. Dautenhahn (John Benjamin Publishing,
Amsterdam, 1999a)
A. Sloman, What Sort of Architecture Is Required for a Human-like Agent? in Foundations of
Rational Analysis, ed. by M. Wooldridge, A. Rao (Kluwer, Boston, 1999b)
Chapter 11
Generation of Music Data: J. Peterson
and L. Dzuris
In the literature, there are many attempts to model musical compositional designs.
Several theorists discuss music in terms of large chunks or sections and overall
function. One example is found in Caplin (1988), Classical Form: A Theory of Formal
Functions For The Instrumental Music of Haydn, Mozart and Beethoven, there is a
focus on larger musical structures such as those found in the symphony movements
of Haydn, Mozart, and Beethoven. Most of the discussion is not relevant to the design
of musical fragments of the type we desire, though there is mention of the idea of
a musical sentence. He states that the sentence is usually eight measures in length,
but that the fundamental melodic material is presented in the opening two measures.
Caplin admits that within the basic idea contained in these first measures, there
may be several distinct motives. Since motives are in and of themselves identifiable
morsels of music, we can logically compose shorter sentences than those Caplin is
analyzing.
Approaching the matter of function from the opposite direction, small units build-
ing up to larger ones, we have Davie (1953), Musical Structure and Design, who
compares musical sounds to words in terms of clauses, sentences, and paragraphs
leading to larger structures of form. He refers to musical cadences in terms of various
punctuation marks. Depending on the type used, impressions of rest, incompleteness
and surprise might be conveyed. He suggests the use of musical cadences which
are perfect and plagal for completeness (by which Caplin means V-I with soprano
on scale degree 1 or IV-I); imperfect for incompleteness (V-I with soprano on scale
degree 3 or 5) and interrupted for surprise (deceptive cadences are V or V7-VI).
© Springer Science+Business Media Singapore 2016 183
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_11
184 11 Generation of Music Data: J. Peterson and L. Dzuris
relationships between sentence parts in language. A sentence is broken into two basic
parts, a noun phrase and a verb phrase. From there, each can be broken into smaller
pieces (article plus noun/adverb plus verb). Cope shows a parse of the C-major scale
that shows the basis for the melodic movement used in our own examples. First, he
presents the sentence as CDEFGABC. Next, he identifies noun equivalents, C (tonic)
and F (subdominant). The G (dominant) functions as a verb, leaving the articles and
their modifiers: A (Prolongation), E (Adverb), D (Prolongation) and B.
Logical motions in music are systematically described by Cope in the following
way. SPEAC equal to Statement, Preparatic, Extension, Antecedents and Consequent.
The desired line of succession then looks like this: S followed by PEA; P by SAC;
E by SPAC; A by EC; and C by SPEA Cope believes that a “generated hierarchy”
of notes is necessary to produce music that makes sense. He reasons that a random
substitution of a word type that would produce a nonsense sentence (noun–verb–
noun could equal horse grabs sky), would have the same effect in music. We agree
with Cope that tonality itself is a hierarchy where certain notes or scale degrees
have specific functions. Additionally, a set of generalities for tonal melodies is given.
Cope based this list on examples from composers across generations. These are basic
composition rules taught in your average music theory classes: use mostly stepwise
motion; follow melodic skips with stepwise motion in the opposite direction; use one
or more notes per beat agreeing with harmony; and usually begin and end on tonic
or dominant chord members. Cope summarizes by saying
With proper selection of elemental representations (i.e.,nouns are tonics) and careful coding
of appropriate local semantics, the same program that produces sentences can produce music.
Kohs (1973). Musical Form: Studies in Analysis and Synthesis stresses that the human
mind will organize music into sections based on repetition and contrast and that
distinctions exist between functional and decorative (nonharmonic) tones. Then, in
making his own connection between speech and music, Koh brings in another factor,
emotion. We will study the connections of emotion to music more completely in
Sect. 11.4.1. Kohs connects to emotional states as follows:
In prehistory, vocal melody was probably developed as a form of emotionally inflected
speech. Melody has been associated with words since the earliest times, and wordless vocal-
ization has always been a rather rare phenomenon. Thus it is not surprising that some of the
characteristics of melody are derived from speech, and that some of the melodic forms are
related to the forms of prose and poetry.
Poetry can be analyzed in terms of meter, its pattern of accented and unaccented
syllables. Musical meter is analyzed the same way; in terms of strong and weak beats.
Kohs compares nonmetrical music to the less structured rhythm that is characteristic
to prose. Again, we have a writer stress that the key to a sentence or musical phrase
is that both words and musical elements are put together based on the function each
will carry out. Kohs is explicit about music having
its own special kind of grammar and syntax. Successive tones may be grouped to form a
musical phrase having a sense of completion and unity similar to that found in a verbal
sentence. Some tones, like the verbal subject and predicate, are essential to the musical
186 11 Generation of Music Data: J. Peterson and L. Dzuris
structure; others are decorative. Suspensions, appogiaturas, neighboring tones and similar
decorations cannot stand alone without resolution any more than an adjective may stand
without a noun or pronoun. Musical phrases may be simple or complex; a short musical idea
may be expanded by a variety of means, such as parenthetical insertions, or extensions at
the beginning or the end.
We will start by using an 18th century historical idea called The Musicalisches
Würfelspiel. In the 1700s, fragments of music could be rapidly prototyped by using
a matrix A of possibilities. We show an abstract version of a typical Musicalisches
Würfelspiel matrix in Eq. 11.1. It consists of P rows and three columns. In the first
column are placed the opening phrases or nouns; in the third column, are placed
the closing phrases or objects; and in the second column, are placed the transitional
phrases or verbs. Each phrase consisted of L notes and the composer’s duty was to
make sure that any opening, transitional and closing (or noun, verb and object) was
both viable and pleasing for the musical style that the composer was attempting to
achieve.
⎡ ⎤
Opening 0 Transition 0 Closing 0
⎢ Opening 1 Transition 1 Closing 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥ (11.1)
⎣ .. . . ⎦
Opening P-1 Transition P-1 Closing P-1
Note that there are P 3 possible musical sentences that can be formed in this
manner. If each opening, transition and closing fragment is four beats long, we can
build P 3 different twelve beat sentences.
It takes musical talent to create such a Musicalisches Würfelspiel array, but once
created, it can be used in the process of learning fundamental principles of the
music compositional process. We will eventually use musical fragments which are
tagged with a specific emotional color to build a model which can assemble musical
fragments which have a specific emotional content. However, we will start with a
proof of concept using a Musicalisches Würfelspiel matrix with emotionally neutral
examples.
We will start with simple compositional patterns and ideas; hence, we will be using
only quarter and half notes permitted in any of the phrases. Further, we do not want
the musical fragments to be too long, so for now each fragment consists of four
beats in 4/4 time. We will begin opening phrases and end closing or cadence phrases
on tonic C, approaching or leaving by step or tonic chord leap. Finally, the middle
phrases are centered around a third or a fifth. Using these guidelines, we created a
Musicalisches Würfelspiel array using opening phrases as nouns, middle phrases as
verbs and closing or cadence phrases as objects.
thus {H1 , H2 } which has cardinality 16. Within this alphabet, the second opening,
ced f , can be written as the matrix
⎡ ⎤
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⎢0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0⎥
n1 = ⎢
⎣0
⎥
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0⎦
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
There are similar matrices for each of the other opening phrases. Each opening phrase
is thus a 4 × 16 matrix that has very special property: a row can only have one 1.
A given middle phrase will have a similar structure and only some of the possible
middle phrase matrices we could use will be acceptable. Our middle phrase design
choices for emotionally flat music are shown in Fig. 11.2.
In essence, when we design these two sets of matrices, we provide samples for a
hypothetical mapping from the set of 4 × 16 opening matrices to the set of 4 × 16
middle matrices. These samples provide enough information for us to approximate
the opening to middle transition function using a blend of neurally inspired architec-
tures and function approximation techniques. Finally, the cadence or closing phrases,
as shown in Fig. 11.3, were designed so that they would sound pleasing when attached
to any of the possible opening—middle phrase combinations. Again, our design gives
us a set of appropriate 4 × 16 matrices which encode valid closing phrases. These
closing phrases need to be coupled to the end notes of any middle phrase. The closing
data we design then gives us enough samples to approximate the middle to ending
transformation.
In addition, our opening data gives us four examples of starting notes for neutral
musical twelve note sequences. Now there are nine possible start notes for each
opening phrase and the fact that we do not choose some of them is important. Also,
each four note sequence in any of the three phrases, opening, middle and closing,
is order dependent. Given a note in any phrase, the selection of the next note that
follows is not random. The actual note sequence that appears in each phrase also
gives sample data that constrains the phrase to phrase transformations. We can use
this information to effectively approximate our mappings using excitation/ inhibition
neurally inspired architectures. Roughly speaking, if a given subset of notes are good
choices to follow another note, then the notes not selected to follow should be actively
inhibited while the acceptable notes should be actively encouraged or enhanced in
their activity. The complete Musicalisches Würfelspiel matrix, as seen in Fig. 11.4,
thus consists of four rows and three columns that will provide a total of 64 distinct
musical fragments that are intended to model neutral musical sentence design.
Currently, these fragments are rather short since they are just four beats (i.e.
L = 4) for each grammatical element. This is enough to help with the prototype
development of this stage. However, when we generate the additional Musicalisches
Würfelspiel matrices that will correspond to the other emotional shadings, we may
find it necessary to use longer note sequences (i.e. increase L) in order to capture the
desired emotional colorings. If that turns out to be true, we can easily return to this
neutral case and redesign the neutral Musicalisches Würfelspiel matrix to use longer
note sequences.
Fig. 11.5 Neutral sequences generated using the first opening phrase, all the middle phrases and
the first ending phrase. The first column of the figure provides a label of the form x yz where x
indicates the opening used; y, the middle phrase used; and z, the ending phrase. Thus, 131 is the
fragment built from the first opening, the third middle and the first ending
that all 64 selections we so generate will be devoid of emotional content. You should
try playing these pieces on the piano and compare them to our later selections that
are purported to have happy, angry and sad overtones that are displayed as detailed
in Sects. 11.4.3, 11.4.4 and 11.4.5.
The fragments shown in Fig. 11.5 show only a few of the possibilities. For example,
the selections for opening two, verb one and all the cadence phrases are displayed
in Fig. 11.6. Again, we invite you to play the pieces. The important point is that
Fig. 11.6 Neutral sequences generated using the second opening phrase, the first middle phrase
and all the ending phrases
11.3 Neutral Music Data Design 191
all 64 pieces we can so generate using the Würfelspiel approach are equally valid
emotionally neutral choices. Hence, the Musicalisches Würfelspiel matrix we have
created captures the essence of the solution to emotionally neutral music composi-
tional design problems.
Can people identify the intended emotion in music from an unfamiliar tonal system? If
they can, is their sensitivity to intended emotion associated with perceived changes in psy-
chophysical dimensions of music [defined later as any property of sound that can be perceived
independent of musical experience, knowledge, or enculturation]?
We were very interested in the results to see if our ideas for emotionally tagged
music would have characteristics that could be readily identified without cultural
constraints. In this small case study, four specific emotions (joy, sadness, anger, and
peacefulness) were presented using ragas of India. In this Hindustani system, there
is a specific raga or collection of notes for nine individual moods. Participants were
asked to rate tempo, rhythmic complexity, melodic complexity, and pitch range in
addition to the four emotions.
Balkwill and Thompson give us some preliminary expectations based on other
studies. Tempo is most consistently associated with emotional content. A slow pace
equates to sadness and, a faster one; joy. Also, simpler melodies with few variations
in melodic contour and more repetition are associated with positive and peaceful
emotions. Complex melodies with more melodic contour and less repetition are
associated with negative anger and sadness. Timbre plays a role as well. Fear and
sadness were reported more when expressed by a violin or the human voice. Finally,
timpani was associated with anger.
An interesting note they mention is that a narrower pitch range (reduced melodic
contour) may be processed as one auditory stream, therefore easy to process which
may cause positive emotional ratings. This may be linked to Meyer’s idea of an
instinctive mental or motor response.
The conclusions were that given music that was not culture-specific to the listeners,
they were forced to rely on other, psychophysical, cues to perceive emotional content.
As predicted, tempo was a strong cue used to successfully identify the ragas intended
for joy, sadness, and anger. It did not work with peacefulness. Also, ratings made by
expert and non-expert listeners were pretty equal in identifying joy and sadness. The
only significant predictor of peacefulness in this study seemed to be timbre. A flute
was highly rated as peaceful.
What happens when two real performers are put up against a computer generated
performance of a piece is exactly the focus of Clarke and Windsor (2000), Real and
Simulated Expression: A Listening Study. There is no solid conclusion in this paper,
other than the matter will need further investigation. They do state that in this study,
the simulated performance treated tempo and dynamics as elements that were corre-
lated based on principles of energy and motion. There were minute differences in the
way human performers treated repeated notes, both rhythmically and dynamically.
Hence, each performance was perceived in different ways by the listeners.
Basic emotions are defined in Juslin (1997), Emotional Communication in
Music Performance: A Functionalist Perspective and Some Data, by the following
attributes. They have distinct functions that contribute to individual survival. They
are found in all cultures and are experienced as unique feeling states. Further, they
appear early in the course of human development and are associated with distinct
autonomic patterns of physiological cues. Further, he states that most researchers
agree on at least four basic emotions: happiness, sadness, anger and fear. In this
194 11 Generation of Music Data: J. Peterson and L. Dzuris
small study, three guitarists were asked play the same melody five different ways.
One was to be without expression. This would correspond to our (null, null) or neutral
fragments. Two aspects examined were whether or not emotions could be communi-
cated to the listeners, and how the performers’ intentions affected expressive cues in
the performance (the psychophysical cues studied in Balkwill and Thompson 1999).
Like other studies in this commentary, they found that expressions of happiness,
sadness, and anger were readily identified by listeners. The fourth emotional state,
in this case fear, was a little elusive. Gender and training did not significantly effect
ones ability to identify intended emotion. The study suggests that each emotion has
certain characteristics as detailed below (the authors point out that in other instru-
ments it is typical to use staccato articulation when expressing anger, but for electric
guitarists, they uniformly revert to legato articulation for expressing anger):
The authors acknowledge the nature of their study as preliminary and in need of fur-
ther extended study. At any rate, a few hypotheses are put forth. The first is that long
and short note durations may be played differently depending on the intended emo-
tion. They found that expressions of happiness were played in shorter note values and
patterns in the expressions of sadness were played in longer notes. Secondly, anger
and happiness were associated with staccato articulation. This is the first instance
we encountered of further distinction between the staccato articulation representing
anger and the staccato articulation representing happiness. It was found that the anger
expressions used uniform staccato patterns where the happiness ones were more vari-
able depending on the positions within the phrase. It is suggested that more study
of this phenomenon needs to be done, as it may be a key component to decoding
happiness.
11.4 Emotional Musical Data Design 195
The underlying goal in building each matrix was to remain as basic as possible.
We decided to work within a monophonic texture, meaning melody line only. Note
values were restricted to quarter notes and half notes in quadruple meter. Quarter
rests were also allowed, but used sparingly. All four matrices (neutral, happy, sad,
angry) are structurally similar. Each consists of three columns with four fragment
choices that are one measure in length.
Any fragment from column one from any of the matrices is designed to function
as an opening phrase. We define an opening phrase as one that clearly establishes a
tonal center. In western tonal music, the tonal focal point can be narrowed to a single
tone/pitch that is known as the tonic. We have made C our tonic note in all cases. In
all but one case, the opening fragment also established the mode as either major or
minor. The exception is made in the angry matrix, where ambiguity is desirable. All
fragments in column two of any of the matrices are designed to function as a transition
phrase. As the label implies, these transition phrases serve as connectors between
a choice from column one and a choice from column three. It is in these middle
phrases that movement away from the tonic is made or continued. This movement
is necessary for forward progress of a melody. Therefore, each transition phrase is
now highlighting a secondary pitch, one other than the tonic note established by
the opening phrase. To close our melodic lines, an ending phrase is chosen. Any
fragment from column three of any of the matrices will function in the same manner.
We designed each to move back to the tonic note in such a way as to produce a
quality of closure to our melodic lines. This was done by approaching tonic in the
most basic of ways. Using stepwise motion up or down to the tonic logically ends the
melodic journey by bringing you back to the home pitch. An alternative is to return
to tonic via melodic skip from the third or fifth scale degree. In the key of C, the
tonic (C) is scale degree 1, D 2, etc. So, a melodic skip from the third of fifth scale
degree means a skip from an E or a G note. Together with the tonic note, third and
fifth scale degrees make up a tonic chord (harmony). By using either the third or the
fifth to lead back to tonic, we again produce a sense of closure by reinstating tonic
as the final destination of our brief melodic journey.
We used the following guidelines to design our Würfelspiel matrices of differ-
ent emotional slants. When we produced emotion-deprived or neutral fragments,
individual characteristics documented by researchers as being contributing factors
of basic emotion in music were neutralized to the best of our abilities. Some of the
contributing factors, such as mode, are also essential in a plausible melody, and could
not be removed. The use of major mode is the default with a tempo of 45 beats, slow
to the point that the individual notes outweigh the overall sense of a melody. The
rational comes from reading. The telltale sign of a new reader is the slow pace during
which equal emphasis is placed on every word. A beginner will produce an emotion-
deprived reading. The same result is our goal here. Further, we use even rhythms and
exact note durations with the melody played with a basic computer generated sound.
196 11 Generation of Music Data: J. Peterson and L. Dzuris
Following the outline above, we have designed the Happy musical data as shown in
Fig. 11.7.
From our 4 × 3 Musicalisches Würfelspiel matrix, we can generate 64 musical
selections of twelve beats each. In Fig. 11.8, we show the selections generated using
opening one, all the possible middle phrases and the first cadence phrase.
Then we show all the selections for opening two, verb one and all the closings
in Fig. 11.9. This still only shows eight of the sixty four possible pieces, of course.
We invite you to sit at the piano and see how they sound. You should hear how that
they are distinctly happy. Our interpretation of the core meaning of a happy musical
fragment is based on ideas from two separate disciplines. First, our reading of the
relevant literature has given us guidance into the choices for notes, tempo and playing
style as outlined above; and second, the psychophysiological studies of Lang et al.
(1998) as outlined in many papers has given us a pseudo-quantitative measure of
the affective content on an emotionally charged image. Musical studies have shown
11.4 Emotional Musical Data Design 197
Fig. 11.8 Happy sequences generated using opening one, all middle phrases and the first closing
Now, we move toward the design of musical data that is intended to be sad. In
Fig. 11.10, you can see the musical data that we designed to have an overall tone of
sadness. To emotionally tag these fragments as “sad,” individual characteristics such
198 11 Generation of Music Data: J. Peterson and L. Dzuris
Fig. 11.9 Happy sequences using opening two, middle phrase one and all the closings
as the use of minor mode, a slow tempo, the use of slurs and legato, and the use of
bass clef to put us in a lower register and so forth were incorporated into the design.
From our 4 × 3 Musicalisches Würfelspiel matrix, we can generate 64 musical
selections of twelve beats each. In Fig. 11.11, we show the selections generated using
opening one, all the possible middle phrases and the first cadence phrase.
Then we show all the selections for all the openings, the first middle phrase and
the second ending in Fig. 11.12. This still only shows eight of the sixty four possible
pieces, of course. We invite you to sit at the piano and see how they sound. You
should hear a sense of sadness in each.
11.4 Emotional Musical Data Design 199
Fig. 11.11 Sad sequences using opening one, all the middle phrases and the first closing
Fig. 11.12 Sad sequences using all the openings, the first middle phrases and the second closing
Fig. 11.14 Angry sequences generated using the first opening, all the middle phrases and the first
closing
Fig. 11.15 Angry sequences generated using opening two, the fourth middle phrase and all the
closings
As you have seen in Sects. 11.4.3, 11.4.4 and 11.4.5, the emotionally tagged musical
data uses a richer set of notes and articulation attached to the notes to construct
grammatical objects. We can think of the added articulation as punctuation marks.
Slurs (one note and multiple note), staccato and marcato accents are attached to
various notes in our examples to add emotional quality. Our design alphabet can be
encoded as H = {c, d, e, f, g, a, b, r } where each note in this alphabet is now
thought of as a musical object with a set of defining characteristics. Here r is rest.
For our purposes, the attributes of a note are choices from a small set of possibilities
from the list A = { p, b, s, a}. The index p indicates what pitch we are using for
the note: −1, denotes the first octave of pitches below middle C; 0, the pitches of
the middle C octave and 1, the first octave of pitches above middle C. The letter
b tells us how many beats the note is held. The length of the slur is given by the
value of s and a denotes the type of articulation used on the note. We choose to treat
slurs as entities which are separate from the other accent markings for clarity. For
these examples, we have slurs that range from zero to three in length, so permissible
values of s are taken from the set {0, 1, 2, 3}. This could easily be extended to longer
slurs. The beat value b is either one of two as only quarter and half notes are used.
There are many possible articulations. An expanded list, for marks either above or
below a note for effect, might include neutral, no punctuation (a = 0); pizzicato, a
dot (a = 1); marcato or sforzando, a > (a = 2); staccato or portato, a −, (a = 3);
strong pizzicato, an apostrophe (a = 4); and sforzato, a ˆ (a = 5). A given note n is
thus a collection which can be denoted by n p,b,s,a where the attributes take on any of
there allowable values. A few examples will help sort out this out. The symbol d1,2,2,1
is the half note d above middle d with a pizzicato articulation which is the start of
a two note slur which ends on the second note that follows this middle d. The rest
202 11 Generation of Music Data: J. Peterson and L. Dzuris
does not have pitch, articulation or slurring. Thus, we set the value of pitch, slurring
and articulation to 0 and use the notation r0,1,0,0 or r0,2,0,0 to indicate a quarter or
half rest, respectively. Our alphabet is thus {H} which has cardinality 8. Each letter
has a finite set of associated attributes and each opening, middle or closing phrase is
thus a sequence of 4 musical entities.
Within this alphabet, an angry middle phrase such as shown in Fig. 11.16a, can
be encoded as {e0,1,0,2 , d0,1,0,2 , c0,2,0,0 }. This would be then written as the matrix in
Eq. 11.2
⎡ ⎤
0 0 {0, 1, 0, 2} 0 0 0 0 0
n1 = ⎣ 0 {0, 1, 0, 2} 0 0 0 0 0 0⎦ (11.2)
{0, 2, 0, 0} 1 0 00000
If we had a fragment with a slur such as shown in Fig. 11.16b, this would be encoded
as {c0,1,1,0 , d0,1,0,0 , d0,1,0,2 , c0,1,0,0 }. In matrix form, we have Eq. 11.3
⎡ ⎤
{0, 1, 1, 0}0 0 0 0 0 0 0 0
⎢ 0 {0, 1, 0, 0} 0 0 0 0 0 0⎥
n1 = ⎢
⎣
⎥ (11.3)
0 {0, 1, 0, 0} 0 0 0 0 0 0⎦
{0, 1, 0, 0}0 0 0 0 0 0 0 0
These matrices indicate which musical object is used in a sequence. The four opening
phrases in a Würfelspiel music matrix can thus be encoded into matrices that are 2 × 8
(both notes in the phrase are half notes) to 4 × 8 (all notes are quarter notes). Each of
these matrices has the special property that a row can only have one nonzero entry. A
given middle phrase will have a similar structure, making only some of the possible
middle phrase matrices acceptable.
Note that encoding music in this way generates a compact data representation.
However, we need to model the data so that each possible musical entry is encoded
in a unique way. The first seven entities in H come in a total of 144 distinct states:
three pitches, two beats, four slur lengths and six articulations. The rest comes in
only two states. Hence, a distinct alphabet here has cardinality 7 × 144 + 2 or 1010.
The size of this alphabet precludes showing an example as we did with the compact
representation, but the matrices that encode these musical samples still possess the
property that each row has a single 1. Of course, with an alphabet this large in size, we
typically do not use a standard matrix representation; instead, we use sparse matrix
or linked list techniques. The data representation we used for the musically neutral
data has a much lower cardinality because the neutral data is substantially simpler.
References 203
References
W. Thompson, B. Robitaille, Can composers expression emotion through music? Exp. Stud. Arts
10, 79–89 (1992)
L. Wedin, Multidimensional scaling of emotional expression in music. Swed. J. Musicol. 54, 1–17
(1972)
Chapter 12
Generation of Painting Data: J. Peterson,
L. Dzuris and Q. Peterson
Our simple painting model will be based on a Würfelspiel matrix similar to what
we used in the above emotionally neutral music compositions. Since our ultimate
goal is to create cognitive models, it is instructive to look at the notion of creativity
from the cortical point of view. The advent of portions of cortex specialized to fusing
sensory information is probably linked in profound ways to the beginnings of the
types of creativity we label as art. Hence, models of creativity, at their base, will
involve models of visual cortex data which can then be fed into associative learning
modules. We have developed data models for the auditory cortex which are music
based in Chap. 11. We need to look at visual cortex data now. We therefore focus on
the creation of data sets that arise from paintings as we feel that this will give us the
data to train the visual input to the area 37 sensor fusion pathway.
Consider the two paintings, Fig. 12.1a, b, which have fairly standard compositional
designs. Each was painted starting with the background and then successive layers
of detail were added one at a time. As usual, the design elements farthest from the
viewer’s eye are painted first. The other layers are then assembled in a farthest to
nearest order. We note that in the generation of computer graphics for games, a similar
technique is used. A complex scene could in principle be parsed into individual
polygonal shapes each with its own color. Then, we could draw each such polygon
into the image plane which we see on the computer screen. However, this is very
inefficient. There are potentially hundreds of thousands of such unique polygons to
find and finding them uses computationally expensive intersection algorithms. It is
much easier to take advantage of the fact that if we draw a polygon on top of a
previous image, the polygon paints over, or occludes, the portion of the image that
lies underneath it. Hence, we organize our drawing elements in a tree format with
the root node corresponding to distances farthest from the viewer. Then, we draw a
scene by simply traversing the tree from the root to the various leafs.
© Springer Science+Business Media Singapore 2016 205
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_12
206 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson
Fig. 12.1 Two simple paintings. a This painting uses many separate layers for the kelp. The
human figure is in an intermediate layer between kelp layers and the seadragons essentially occupy
foreground layers. The background varies from a dark blue at the bottom to very light blue, almost
white, at the top. b This painting uses a complicated background layer with many trees. The dragon
and the large tree are midground images, while the butterflies and moths are primarily foreground
in nature. The girl is midground
The painting seen in Fig. 12.1a started with the background. This used a gradient
of blue, ranging from very dark, almost black, at the bottom, to very light, almost
white, at the top. There are, of course, many different shades and hues of blue as
brushes are used to create interesting blending effects with the various blues that
are used. However, we could abstract the background to a simple blue background
and capture the basic compositional design element. The many kelp plants are all
painted in different planes. The kelp farthest from the viewer are very dark to indicate
distance, while the plants closest to the viewer use brighter greens with variegated
hues. We note that we could abstract the full detail of the kelp into several intermediate
midground layers: perhaps, the farthest midground layer might be one kelp plant that
is colored in dark green with the second, closest midground layer, a bright green
kelp plant. The human figure is placed between kelp layers, so we can capture this
compositional design element by placing a third midground layer between the two
midground kelp plant layers. Finally, there are many seadragons in foreground layers
at various distances from the viewer. We could simplify this to a single foreground
layer with one seadragon painted in a bright red. Hence, the abstract composi tional
design of the painting in Fig. 12.1a is as follows:
12.1 Developing a Painting Model 207
In a similar fashion, we can analyze Fig. 12.1b. The background in this painting is
a large collection of softly defined trees. These are deliberately not sharply defined
so that their distance from the viewer is emphasized. The midground image is the
very large tree that runs from the bottom to the top of the painting. There are then
two more midground images: the whimsical dragon figure on the tree branch and
the human figure positioned in front of the tree. Finally, there are a large number
of Baltimore butterflies and Luna moths which are essentially foreground images.
We can abstract this compositional design as follows:
The paintings shown in Fig. 12.1a, b are much more complicated than the simple
abstract designs. However, we can capture the essence of the compositional design
in these tables. We note that, in principle, a simpler description in terms of one
background, one midground and one foreground is also possible. For example, we
could redo the abstract designs of Fig. 12.1a, b as follows:
Fig. 12.2 The background and foreground of a simple painting. a Background of a mellow image.
Note this image plane is quite complicated. Clearly, it could be broken up further into midground
and background images. b Foreground of a mellow image which is also very complicated. A simpler
painting using only one of the foreground elements would work nicely also. c A mellow painting
These new designs do not capture as much of the full complexity of the paintings
as before, but we believe they do still provide the essential details. Also, we do not
believe a two layer approach is as useful. Consider for example a two plane painting
as shown in Fig. 12.2a, b. These two planes can then be assembled into a painting as
we have described to give what is shown in Fig. 12.2c.
In this design, the foreground is too complex and should be simplified. There is
also too much in the background. Some of that should move into a midground image
so that the design in more clearly delineated.
We will therefore limit our attention to paintings that can be assembled using a
background, midground and foreground plane. The background image is laid down
first—just as an artist applies the paint for the background portion first. Then, the
midground image is placed on top of the background thereby occluding a portion
of the background. Finally, the foreground image is added, occluding even more of
the previous layers. We recognize, of course, that a real artist would use more layers
and move back and forth between each in a non-hierarchical manner. However, we
feel, for the reasons discussed above, that the three layer abstraction of the composi-
tional design process is a reasonable trade off between too few and too many layers.
Also, we eventually will be encoding painting images into mathematical forms for
computational purposes and hence, there is a great burden on us to design something
pleasing using this formalism which at the same time is sufficiently simple to use as
data in our cognitive model building process.
12.1 Developing a Painting Model 209
Our painting model thus will use a compositional scheme in which a valid painting
is constructed by three layers: background (BG), midground (MG) and foreground
(FG). A painting is assembled by first displaying the BG, then overlaying the MG
which occludes some portions of the BG image and finally adding the FG image.
The final FG layer hides any portions of the previous layers that lie underneath it.
This simplistic scheme captures in broad detail the physical process of painting. When
we start a painting, we know that if we paint the foreground images first, it will be
technically difficult and aesthetically displeasing to paint midground and background
images after the foreground. A classical example is painting a detailed tree in the
foreground and then realizing that we still have to paint the sky. The brush strokes
in the paint medium will inevitably show wrong directions if we do this, because we
can not perform graceful side to side, long brush strokes since the foreground image
is already there. Hence, a painter organizes the compositional design into abstract
physical layers—roughly speaking, organized with the background to foreground
layers corresponding to how far these elements are away from the viewer’s eye.
Recall the Würfelspiel matrix used in musical composition had the general form
below.
⎡ ⎤
Opening 0 Transition 0 Closing 0
⎢ Opening 1 Transition 1 Closing 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥
⎣ .. . . ⎦
Opening P-1 Transition P-1 Closing P-1
The building blocks of a typical musical phrase here are an Opening followed
by a Transition and then completed by a Closing. We can use this technique to
generate painting compositions by letting the Opening phrase be the Background
of a painting; the Transition, the Midground and the Closing, the Foreground.
Hence, the painting matrix would be organized as in the matrix shown in Eq. 12.1
⎡ ⎤
Background 0 Midground 0 Foreground 0
⎢ Background 1 Midground 1 Foreground 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥ (12.1)
⎣ .. . . ⎦
Background P-1 Midground P-1 Foreground P-1
This painting matrix would then allow us to rapidly assemble P3 different paint-
ings. Both of these data sets are examples of Würfelspiel data matrices that enable
us to efficiently generate large amounts of emotionally labeled data.
210 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson
With this said, we can begin to design a series of paintings that are assembled
from three pieces: the background, midground and foreground thereby generating a
Künsterisches Würfelspiel matrix approach, an artistic toss of the dice. index-
The design of emotionally labeled painting data!the neutral painting Würfelspiel
matrix!neutral data design: flat and monochromatic colors with sparse lines Our first
Künsterisches Würfelspiel matrix will consist of four backgrounds, midgrounds and
foregrounds assembled into the usual 4 × 3 matrix and we will specifically attempt
to create painting compositions that have a neutral emotional tone. We have thought
carefully about what emotionally neutral paintings assembled in this primitive fash-
ion from backgrounds, midgrounds and foregrounds should look like both as drawn
and as colored elements. We have decided on flat and monochromatic colors with
sparse lines. A typical painting composition design can be seen in Fig. 12.3a, b and c.
We can then assemble the background, middle ground and foreground elements into
a painting as shown in Fig. 12.3d. Note that this painting is quite bland and elicits no
emotional tag. In Fig. 12.4, we see a matrix of four backgrounds, four midgrounds
and four foregrounds which are assembled in a Würfelspiel fashion.
12.2 Neutral Painting Data 211
We can use this matrix to assemble 64 individual paintings. We are displaying them
in thumbnail form in the Figs. 12.5, 12.6, 12.7 and 12.8.
212 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson
Our abstract painting compositions are encoded as the triple {b, m, f } where b denotes
the background,
m, the midground and f the foreground layer, respectively. Each of these lay-
ers is modeled with a collection of graphical objects. Each object in such a list
has the following attributes: inside color, ci ; boundary color, cb ; and a bound-
ary curve, ∂, described as an ordered array {(xi , yi )} of position coordinates.
12.3 Encoding the Painting Data 213
Fig. 12.9 Encoding a neutral painting. a The second background image. b The background edges.
Note there are two edge curves. c The first midground image. d The midground edges. e The first
foreground image. f The foreground edges
Consider the neutral painting constructed from the second background, first
midground and first foreground image. Compute the edges of each image
layer using the Linux tool convert via the command line convert -edge 2
-negate <file in> <file out>. The -edge 2 command option extracts
the edges giving us an image that is all black with the edge in white. The second
-negate then swaps black and white in the image to give us the edges as black
curves.
For expositional convenience, we will assume all of the paintings are 100 × 100
pixels in size. We can divide this rectangle into a 10 × 10 grid which we will
call the coarse grid. Note that in Fig. 12.9b, the edge is a curve which can be
Fig. 12.10 Approximating the edge curve. a A 10 × 10 portion of a typical background image that
contains part of an edge curve. The grid from horizontal pixels 40–49 and vertical pixels 70–79 is
shown. The 10 × 10 block of pixels defines a node in the coarse grid that must be labeled as being
part of the edge curve. b A typical coarse scale edge curve in which each filled in circle represents
a 10 × 10 block in the original image which contains part of the fine scale edge curve
214 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson
Fig. 12.11 Assembling a happy painting. a Happy background. b Happy midground. c Happy
foreground. d A happy painting
approximated by a collection of positions {(xi , yj )}, where the index i ranges from 0
to some positive integer N, by drawing the line segments between successive points
pi (t) = t(xi , yi ) = (1 − t)(xi+1 , yi+1 ) for t in [0, 1] for all appropriate indices i. The
smaller 10 × 10 grid allows us to approximate these edge positions as shown in
Fig. 12.10a. Each of the 100 large scale boxes that comprise the 10 × 10 grid that
contain a portion of an edge curve are assigned a filled in circle in Fig. 12.10b. We see
that the fine scale original edge has been redrawn using a coarser scale. Although
there is inevitable loss of definition, the basic shape of the edge is retained.
In Chap. 11, we discussed the problem of generating neutral musical data alpha-
bets. Let’s revisit that now.
We can extend those ideas by realizing that in the context of music, part of our
alphabet must represent notes of a variety of pitches. The music alphabet included
letters which represented the standard seven notes of the octave below middle C,
middle C octave and the octave above middle C. These notes could be listed as the
matrix of Eq. 12.2 where the subscript −1, 0 and 1 on each note indicate the pitches
below middle C, of middle C and above middle C, respectively.
12.3 Encoding the Painting Data 215
⎡ ⎤
c−1 d−1 e−1 f−1 g−1 a−1 b−1
M = ⎣ c0 d0 e0 f0 g0 a0 b0 ⎦ (12.2)
c1 d1 e1 f1 g1 a1 b1
We note that there are more musical octaves that could be used and hence the
matrix M, now 3 × 7, could be as large as 7 × 7, if we wished to encode all of
the notes on a piano keyboard. If we equate the last row of M with vertical pixel
position 0 and the first row with 2, we see there is a nice correspondence between the
matrix of positional information we need for the paintings and the matrix of pitch
information we use for music. Hence, the manner in which we choose an alphabet
for the painting data is actually closely aligned with the way in which we choose the
alphabet for musical data despite the seeming differences between them.
We can also approximate the edge from the foreground, Fig. 12.9d, in a similar
fashion. These edges define closed figures in the plane under the following conditions:
first, if the first and last point of the curve are the same; and second, if the first and
last point of the curve are on the boundary of the box. For example in Fig. 12.9c,
the top edge hits the boundary of the image box on both the right and left side.
Hence, we will interpret this as two simple closed figures in the plane: first, the
object formed by including all the portions of the image box below the edge; and
second, the complement of the first object formed by the portions of the image box
above the edge. In addition, there is a second edge which is not a closed figure which
therefore does not have an inside portion.
It is clear the edges will always have an edge color and will possibly have an inside
color. We will let E pj denote an edge set. The first superscript, p, takes on the values
0, 1 or 2, and indicates whether the edge set is part of a background, midground
or foreground image. The second superscript tells us which edge set in a particular
image we are focusing on. For example, Fig. 12.9c has two edge sets. Therefore, this
background image can be encoded as the set E 0 defined by
since the second edge set does not require an inside color. To complete the description
of this background image, we note that the entire image is drawn against a chosen
base color. Hence, the background image can be described fully by the set D0
D0 = {β 0 , E}
where β 0 represents the base color of the background image. Note that we do not
need to add a base color to the midground and foreground images. In general, we
have a finite number of edges in each image. Let M 0 , M 1 and M 2 denote the number
of edges the background, midground and foreground images contain. Further, the
p p
ith ordered pair of the jth edge in an image is labeled as (xij , yij ) where the value
of p indicates whether we are in a background, midground or foreground image.
216 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson
We will label the cardinality of each edge set by the integers N 0j , N 1j and N 2j with
the first superscript having the same meaning as before and the second labeling which
1j
edge set we are considering. Thus, we can let Ei represent the ith ordered pair in
the jth edge in the midground. Further, we let inside and edge colors be denoted by
1j 1j
ci and ce , respectively. Also, we will assume that the paintings use a small number
of colors, here 8. We can encode a painting into the matrix shown in Eq. 12.3.
⎡ 0 0 0 0
⎤
bg {E 00 , N 00 , β 0 , ci00 , ce00 } · · · {E 0M , N 0M , ci0M , ce0M }
⎢ 1 1 1 1 ⎥
⎣ mg {E 10 , N 10 , ci10 , ce10 } · · · {E 1M , N 1M , ci1M , ce1M } ⎦ (12.3)
0 2 2 2
fg {E 20 , N 20 , ci20 , ce20 } · · · {E 2M , N 2M , ci2M , ce2M }
The artist Schneider (2001), (Capturing Emotion), develops a philosophy about paint-
ing. Believing he “cannot paint feelings, but can apply paint in a manner that brings
about an idea of those feelings,” he consciously decides on particular elements to
incorporate that are going to best portray the emotion he is seeking. The first impor-
tant choice to make is the initial tone (color) laid down on a blank canvas. This would
be our background layer. Schneider says this choice immediately establishes mood
based on whether you choose a cool or warm tone. Next, he discusses edges and
values. He places a value of nine for pure black and pure white is a one value.
The degree of sharpness in the edges and the range of value used create varying
degrees of contrast. Sharp edges and full range use of values gives a harsh effect that
would not be good choices for paintings that are supposed to evoke emotions such as
12.4 Emotionally Labeled Painting Data 217
flashes of light will warn; glowing light will comfort; shadows from a spotlight
may frighten; morning light may bring hope. Colors can be subjective, with much
meaning linked to culture and association. Consider the colors blue and red. Blue may
be associated with peace and calmness, but also sadness. Red might evoke feelings
of anger and violence, but also love. More generalized statements can be made about
colors and combinations. Complementary colors, those opposite one another on a
color wheel, cause a visual vibration when placed together. This creates a more
dynamic and lively feel. Also, greater intensity of color tends to imply emotional
excitement. Use of colors near each other on the color wheel will quiet the energy
of a painting. Hogbin divides marks into three types: non-objective, abstraction, or
representational depending on the intent of the artist. Emotion enters as the viewer
attaches meaning to a mark and does one of three things: empathizes, recoils, or is
curious. Finally, lines can suggest activity or passivity as it moves from wiggly to a
curve, and ending straight.
Useful sources of information on how artists address the issue of emotion in their
work are included in the collection of writings by artists about art Goldwater (1974),
Artists on Art. Though the turn toward expressing emotion in art is often considered a
19th century change, this collection includes source references that show that artists
far earlier than that were considering how to evoke emotion within their paintings.
Leon Battista Alberti’s Kleinere Künsttheoretische Schriften, Vienna, 1877, passim.
(Collated with De pictura, Basel, 1540.) (Sch. p. 111): this 15th century artist, 1404–
1472, wrote, “A narrative picture will move the feelings of the beholder when the
men painted therein manifest clearly their own emotions....emotions are revealed
by the movements of the body”. Giovanni Paolo Lomazzo’s Trattata dell’arte della
pittura, scultura, ed architettura, Milan, 1585, (Sch. p. 359): within a 16th century
artist’s (1538–1600) definition of painting we read, “Painting is an art which...even
shows visibly to our eyes many feelings and emotions of the mind”. Nicolas Poussin’s
Correspondance de Nicolas Poussin, ed. Ch. Jouanny, Paris, 1911, Letters 147, 156.-
G.P. Bellori, Le vite de’pittori, scultori, ed architetti moderni, Rome, 1672, pp. 460–
462: here, from a 17th century (1594–1665) artist, we find mention of emotions
via discussion on form and color. “The form of each thing is distinguished by the
thing’s function or purpose. Some things produce laughter, others terror; these are
their forms,” and “Colors in painting are as allurements for persuading the eyes, as
the sweetness of meter is to poetry”.
Charles Le Brun’s Conference Upon Expression, tr. John Smith, London, 1701,
passim. (Paris, 1667) (Sch. p. 555): the artist, whose dates are 1619–1690, gives us
some specific instruction about how to portray different expressions with the face and
body language within paintings after stressing the importance of expression overall.
It is a necessary Ingredient in all the parts of Painting, and without it no picture can be
perfect...Expression is also a part which marks the Motions of the Soul, and renders visible
the Effects of Passion. Horror can be portrayed by the following suggestions, “the Eyebrow
will be still more frowning; the Eye-ball instead of being in the middle of the Eye, will
be drawn down to the under lid; the Mouth will be open, but closer in the middle than at
the corners, which ought to be drawn back, and by this action, wrinkles in the Cheeks; the
Colour of the Visage will be pale; and the Lips and Eyes something livid; this Action has
12.4 Emotionally Labeled Painting Data 219
some resemblance of Terror”, and If to Joy succeed Laughter, this Motion is expressed by
the Eyebrow raised about the middle, and drawn down next the Nose, the Ees almost shut;
the Mouth shall appear open, and shew the Teeth; the corners of the Mouth being drawn
back and raised up, will make a wrinkle in the Cheeks, which will appear puffed up, and
almost hiding the Eyes; the face will be Red, the Nostrils open; the Eyes may seem Wet, or
drop some Tears, which being very different from those of Sorrow, make no alteration in the
Face; but very much when excited by Grief.
In the Pierre-Paul Prud’hon quote from Charles Clément’s Prud’hon, sa vie, ses
oeuvres, et sa correpondance, Paris, 1872, pp. 127, 178–80, written in 18th century
Rome (1787), Prud’hon states,
...in general, there is too much concern with how a picture is made, and not enough with
what puts life and soul into the subject represented...no one remembers the principal aim of
those sublime masters who wished to make an impression on the soul, who marked strongly
each figures character and, by combining it with the proper emotion, produce an effect of
life and truth that strikes and moves the spectator....
From Gustave Coquiot’s Seurat, Paris, ca. 1924, pp. 232–33, we find that on August
28, 1890, Seurat made the following comments about aesthetics. “Gaiety of tone
is given by the dominance of light; of color, by the dominance of warm colors; of
line by the dominance of lines above the horizontal. Calm of tone is given by the
equivalence of light and dark; of color, by an equivalence of warm and cold; and of
line, by horizontals. Sadness of tone is given by the dominance of dark; of color, by
the dominance of cold colors; and by line, by downward directions”.
Expression beyond that portrayed by a human subject in a painting is referenced
in letters Matisse wrote in 1908, from The Museum of Modern Art, Henri Matisse,
ed. Alfred H. Barr, New York, 1931, pp. 29–36. (La Grand Revue, Paris, Dec. 25,
1908). Matisse says
What I am, above all, is expression...Expression, to my way of thinking, does not consist
of the passion mirrored upon a human face or betrayed by a violent gesture. The whole
arrangement of my picture is expressive. The place occupied by figures or objects, the empty
spaces around them, the arranging in a decorative manner the various elements at the painter’s
disposal for the expression of his feelings, In a picture every part will be visible and will play
the role conferred upon it, be it principal or secondary. All that is not useful in the picture is
detrimental...
Now our task is to take these ideas and use them to create useful emotionally
labeled painting data.
220 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson
As usual, we will think of paintings as assembled from three pieces: the background,
midground and foreground. We will develop Künsterisches Würfelspiel matrices that
correspond to sad and happy emotional states.
A typical happy painting (BG, MG, FG) would be constructed to give the
overall impression of a happy emotional state. An example is shown in
Fig. 12.11a, b and c. We can assemble the background, middle ground and fore-
ground elements into a painting as shown in Fig. 12.11d.
Now consider the array we get from four kinds of happy background, midground
and foreground images. In Fig. 12.12, we see the resulting matrix. The first column
of images are the backgrounds, the middle column, the midgrounds and finally, the
last column, the foregrounds. Note that this matrix has a definite emotional state
associated with it as all the images are somewhat happy—definitely not emotionally
neutral images.
We can use this matrix to assemble 64 individual paintings. We are displaying them
in thumbnail form in the Figs. 12.13, 12.14, 12.15 and 12.16.
Fig. 12.17 Assembling a sad painting. a Sad background. b Sad midground. c Sad foreground. d
A sad painting
be seen in Fig. 12.17a, b and c. We can assemble the background, middle ground and
foreground elements into a painting as shown in Fig. 12.17d.
Now consider the array we get from four kinds of sad background, midground
and foreground images. In Fig. 12.18, we see the resulting matrix. The first column
of images are the backgrounds, the middle column, the midgrounds and finally, the
last column, the foregrounds. Note that this matrix has a definite emotional state
associated with it as all the images are somewhat happy—definitely not emotionally
neutral images!
We can use this matrix to assemble 64 individual paintings. We are displaying them
in thumbnail form in the Figs. 12.19, 12.20, 12.21 and 12.22.
12.4 Emotionally Labeled Painting Data 223
We now have enough data from both music and paintings to begin the process of
training cortex in a full brain model.
224 12 Generation of Painting Data: J. Peterson, L. Dzuris and Q. Peterson
References
R. Goldwater, M. Treves (eds.), Artists on Art (Pantheon Books, New York, 1974)
S. Hogbin, Appearance and Reality (Cambium Press, Bethel, 2000)
R. Pickford, Psychology and Visual Aesthetics (Hutchinson Educational LTD, London, 1972)
W. Schneider, Capturing emotion. Am. Artist 65(703), 30–35 (2001)
Chapter 13
Modeling Compositional Design
We believe that one of the hardest problems to overcome in our attempt to develop
software models of cognition is that of validation. How do we know that our the-
oretical abstracts of cellular signaling are giving rise to network architectures of
interacting objects that are reasonable? Since we are attempting to create outputs
that are good approximations of real cognitive states such as emotional attributes,
it is not clear at all how to measure our success. Traditional graphs of ordinal data,
tables of generated events versus measured events couched in the language of math-
ematics and the data structures of computer science, while valid, do not help us to
see that the models are correct. To address this problem, we decided that in addition
to developing cognitive models that are measured in traditional ways, we would also
develop models that can be assessed by experts in other fields for validity. There are
two such models we are attempting to build: one is a model of music composition
in which short stanzas of music are generated autonomously that are emotionally
colored or tagged. The second is a model of painting composition in which primi-
tive scenes comprised of background, foreground and primary visual elements are
generated autonomously with emotional attributes.
The cognitive models are based on simplified versions of biological information
processing whose salient elements are sensory and associative cortex, the limbic
system and pathways for the three key neurotransmitters serotonin, norepinephrine
and dopamine. We will discuss some of the details of these computational modules
in the sections to come. To validate these models, we have begun by constructing
musical and painting data that will serve as samples of the cognitive output states we
wish to see from Area 37 of the temporal cortex. The associative cortex devoted to
assigning higher level meaning to musical phrases is trained with musical data built
using an 18th century approach called a Würfelspiel matrix and a simple grammar
based code. We have constructed examples of neutral, sad, happy and angry twelve
beat musical sentences which are to be the desired output of this portion of area 37
which functions as a polymodal fusion device in Chap. 11. In effect, we are using this
data, to constrain the area 17 to area 37 pathways. Similarly, the associative cortex
associated to assigning higher level meaning to a class of painting compositions
built from assembling the three layers, background, midground and foreground, is
trained from similar Würfelspiel matrices devoted to emotionally labeled painting
compositions which were developed in Chap. 12. The resulting cognitive model then
develops outputs in four emotional qualia for the separate types of inputs: auditory
(music) and visual (painting).
The first step that we need to take in building our models is to use the musical and
painting data to constrain or train our model of the associative cortex. In general, our
model takes this specialized sensory input and generates a high level output as shown
in Fig. 13.1. In order to perform this training, we need to develop an abstract model
of information processing in the brain. We have chosen the abstraction presented
in Fig. 5.3. There is much biological detail missing, of course. We are focusing on
the associative cortex, the limbic system and a subset of neurotransmitter generation
pathways that begin in various portions of the midbrain. The architecture of the
limbic system that we will use is shown in Fig. 5.2 and the information processing
pathways we will focus on are shown in Fig. 13.2. Our cortex model itself is shown
in Fig. 5.1. The musical data provides the kind of associated output that might come
from area 37 of the temporal cortex. The low level inputs that start the creation of a
music phrase correspond to the auditory sensory inputs into area 41 of the parietal
cortex which are then processed through areas 5, 7 and 42 before being sent to the
further associative level processing in the temporal cortex. The painting data then
provides a similar kind of associated input into area 37 from the occipital cortex.
Inputs that create the paintings correspond to the visual sensory inputs into area 17
of the occipital cortex which are then further processed by area 18 and 19 before
being sent to the temporal cortex for additional higher level processing. We fully
understand how simplified this view of information processing actually is, but we
believe it captures some of the principle features. In Fig. 13.3, we show how we will
use the musical and painting data to constrain the outputs of the associative cortex
and limbic system. The data we discuss here will allow us to build the model shown
in Fig. 13.4. In this model, we are essentially testing to see if the we generate the
kinds of high level meta outputs we expect. Hence, this is part of our initial validation
phase. However, a much more interesting model is obtained by inversion as shown
in Fig. 13.5. Here, we use fMRI and skin conductance inputs and starting elements
for music and painting data that have not been used in the training phase to generate
new compositional designs in these various modalities.
In Fig. 13.3, we indicate that the outputs of the limbic system of our model are
sent to the meta level pathways we use for music, painting and so forth and also to
the cerebellum for eventual output to motor command pathways.
The emotional tagged data sets contain examples of equally valid solutions to
compositional design processes. We have discussed musical data that is neutral and
emotionally tagged (Chap. 11), neutral and neutral paintings (Chap. 12). In each of
these chapters, we discuss the generation of 64 examples of solutions to compo-
sitional design tasks in different emotional modalities that are equally acceptable
according to some measure. So, can we learn from this kind of data how the experts
that designed these data samples did their job? Essentially, buried in these data sets
are important clues about what makes a great design. How do we begin to understand
the underlying compositional design principles?
230 13 Modeling Compositional Design
Fig. 13.3 The musical and painting data is used to constrain the outputs of the associative cortex
and limbic system
Each data set that is encoded into a Würfelspiel matrix, whether using music or
art, therefore contains crucial information about equally valid examples of data in
different emotional modalities. From this data, we can build mappings that tell us
which sequences of choices are valid and which are not from the perspective of the
expert who has designed the examples. In a very real sense, a cognitive model built
from this data is automatically partially validated from a psychological point of view.
13.2 Connectionist Based Compositional Design 231
Fig. 13.4 Music or painting initialization generates fMRI and skin conductance outputs and a full
musical or painting design
Let’s now look at how we might develop a standard connectionist approach to building
models that understand our data. We will move to an explicit neurobiological model
later. Each data set that is encoded into a Würfelspiel matrix, whether using music, art
or something else, contains crucial information about equally valid design solutions
in different modalities. From this data, we want to build mappings that tell us which
sequences of choices are valid, and which are not, from the perspective of the expert
who has designed the examples. In a very real sense, a cognitive model built from
this data is automatically partially validated from a psychological point of view. For
example, the neutral and emotionally tagged music and painting data sets contain
examples of equally valid solutions to a compositional design process in both neutral
and emotionally labeled flavors. Each set of data contains 64 examples of solutions to
compositional design tasks that are equally acceptable according to some measure.
13.2.1 Preprocessing
Each data set has an associated alphabet with which we can express noun, verb and
object units. For our purposes, let’s say that each of the nouns, verbs or objects is
a finite list of actions from an alphabet of R symbols, where the meaning of the
symbols is, of course, highly dependent on the context of our example. In a simple
232 13 Modeling Compositional Design
Fig. 13.5 fMRI and skin conductance inputs plus music or painting initialization generate a full
musical or painting design in a given emotional modality
music example, the list of actions might be a sequence of four beats and the alphabet
could be the choices from the C major scale. Thus, we will assume each of the P
nouns consists of a list of length L from an alphabet of R symbols. The raw inputs we
see as the noun vector n are thus normally processed into a specialized vector for use
in algorithms that model the compositional process. For example, in the generation of
a painting, we use our painting medium to create the background which is a process
using pigment on a physical canvas. This is the raw input n and each element of a
noun n is a letter from the alphabet. Let the letters in the alphabet be denoted by
a0 through aR−1 . Then we can associate the noun with a vector of L components,
n[0], . . . , n[L − 1] where component n[j] is a letter we will represent by ajn . The
letter ajn is then encoded into a vector of length R all of whose entries are 0 except for
the entry in slot j. In general, we can choose to encode n into a form more amenable
to computation and data processing in many ways. In our work, we have encoded
the raw data into an abstract grammar, thereby generating the feature vector N. In
general, the input nouns n are preprocessed to create output noun states denoted by
N. In a similar fashion, we would preprocess verb and object inputs to create verb and
object output states denoted by V and O, respectively. The preprocessing is carried
13.2 Connectionist Based Compositional Design 233
out by mappings fn , fv and fo , respectively. For example, the mapping from raw data
to the feature vector form for nouns is represented by Eq. 13.1:
⎡ ⎤
⎡ ⎤ N0
a0n ⎢ ⎥
⎢ ⎥ fn ⎢ N1 ⎥
n = ⎣ ... ⎦−→N =⎢ .. ⎥
n
⎣ . ⎦
aL−1
NL−1
Our association of a noun n with the feature vector N is thus but one example of the
mapping fn . In a similar fashion, we would preprocess verb and object inputs to create
verb and object output states denoted by V and O, respectively. The preprocessing
is carried out by mappings fv and fo , respectively.
Now, a primitive object in our purported compositional grammar has length L. We
can denote such an input noun object as ni and a corresponding output noun object as
Ni . Our mapping problem is thus to determine the rule behind the mapping from the
noun feature vectors N to the verb feature vectors V , gN V , and from the verb feature
vectors V to the object feature vectors O, gV O . We can express this mathematically
by Eq. 13.1.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
N0 V0 V0 O0
⎢ N1 ⎥ g ⎢ V1 ⎥ ⎢ V1 ⎥ g ⎢ O1 ⎥
⎢ ⎥ NV ⎢ ⎥ ⎢ ⎥ VO ⎢ ⎥
⎢ .. ⎥ −−→ ⎢ .. ⎥ and ⎢ .. ⎥ −−→ ⎢ .. ⎥ (13.1)
⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦
NL−1 VL−1 VL−1 OL−1
We can combine these processing steps into the diagram shown in Fig. 13.6.
The data we are given is order dependent. For example, if we are given a noun, ni
of form {ai0 n
, . . . , ai,L−1
n
}, then we attend to the letters of this noun sequentially as
ai0 → ai1 → · · · → ai,L−1
n n n
. We are given that each noun ni is associated with a set
v v
of possible verbs, {vj } equal to {aj0 , . . . , aj,L−1 } for 0 ≤ j < P − 1 and one task is
thus to understand the noun to verb mapping. However, another task is to understand
how to generate the original noun sequence. Why are some noun sequences useful or
pleasing in this context and others are not? To generate a noun sequence n equal to
{a0n , . . . , aL−1
n
} means we choose a random start letter a0n and then from that preferred
sequences are generated while non-interesting words are biased against. Hence, we
think of a mapping, the Noun Generating Map or NGM as accepting an input, a0n
and generating a preferred second note a1n . Then a1n is used as an input to generate a
preferred third note, a2n and so on until the full string of letters is finished. To model
this mapping, we start by using the information about useful noun strings we have.
n n
Given letter ai0 , we know that ai1 is preferred.
n
We embed the original data into an analog vector by converting each letter ai0 of
the noun, which is a 0 or a 1 in our initial encoding, into a real number ξi0 . To set the n
threshold. The entry in the binary encoded letter that corresponds to a 1 will be
randomly chosen from [1 − 1.5, 1 − 0.5]. Hence, for = 0.2, entry with a 1 will
be assigned a real number in the interval [0.7, 0.9]. Consequently, the raw binary
noun, verb and object data is mapped into a new analog representation in which each
entry is a real number chosen as above.
n
Since we know that only certain letters should follow ai0 , only certain analog
states ξi1 are permissible given a start state of ξi0 . We infer from this that there is
n n
an unknown mapping h01 which maps the analog encoding of letter 0 to the analog
encoding of letter 1, h01 (ξi0 n
) = ξi1
n
. This mapping has special characteristics: our data
tells us that only certain letters that can follow letter 0. Each acceptable second letter
is a vector in R dimensional space whose components are analog zero except the one
the corresponds to the second letter. That component is an analog one. We have at
most P examples of acceptable second letters. This means we have R − P second
letters that are not acceptable. In other words, the preferred output for a given noun
is a R row matrix formed from the acceptable verbs that has at most P columns.
We can do this for all of the nouns in our data set. Hence, we will have at most P first
letter choices and each of these will have at most R − P unacceptable second letters.
Let T and T denote the set of all acceptable and unacceptable outputs respectively.
Following a traditional machine learning approach, we could model the mapping
h01 as a chained feed forward network, with feedforward and feedback connections
13.2 Connectionist Based Compositional Design 235
between artificial neurons. This mapping takes the first letter of a noun and outputs a
set of acceptable second letters. Training is done by matching input to output using
excitation and inhibition (i.e., using a Hebbian approach). We know which elements
in the analog output vector should be close to one and which should be close to zero
for the analog input.
We initialize all of the tunable parameters to be small positive if the connection
from component k in the input to component j on the output is between two analog
ones. All other connections are initialized to small negative numbers. For each first
letter we have data for, we do the following: pick the initial first letter in our data set
and compute the relevant output for the first associated second letter. Increase the
connection weights on any path between a high input and a high output and decrease
the connection weight on any other paths. Cycle to the next second letter and redo
until all possibilities are exhausted. We thus continue this process until every input
generates an output with a high component value in the location that corresponds to
the index for the second letter. At this point, we say we have trained our nonlinear
mapping h01 so that first letters in our data noun sequences are biased to connect
to their corresponding second letters. A second letter is then chosen randomly from
the set of acceptable second letters via an additional input line which is a sense is a
coarse model of creativity.
If we let the set of all the generated weights be the matrix W 01 , we note this is an
R × PR size matrix. We can develop a similar mapping for the second to third letter,
h12 with weights W 12 , the third to fourth letter, h23 with weights W 23 , and finally,
the mapping from letter L − 1 to letter L, hL−2,L−1 with weights W L−2,L−1 .
The procedure for creating a valid noun sequence can now be given. Choose
a valid starting letter for a noun, ai0 n
and we map it to its analog form, ξi0 n
. Then,
applying the first to second letter map, we find an acceptable second letter by the
computation h01 (ξi0 n
) = {ξi1n
}. This second letter can be used as the input into the next
map, generating an acceptable third letter. Hence, the composite map, h12 h01 takes a
valid first letter and creates the three letter analog sequence defined by Eq. 13.2:
⎡ ⎤ ⎡ ⎤
ξi0
n
ξi0
n
⎣ ξi1
n ⎦
∈ ⎣ h01 (ξi0
n
) ⎦ (13.2)
ξi2
n
h (h (ξi0 ))
12 01 n
The analog sequences are then mapped into three letter sequences by assigning an
analog value to either a one or zero using a threshold tolerance τ . This means we
map a component whose value is above τ to 1 and one whose value is below τ to
0. This can of course generate invalid sequences as we are only supposed to have
a single 1 assigned from any analog sequence. We do have to make sure that our
developed map does not allow this. For example, for τ = .6, the vector
⎡ ⎤ ⎡ ⎤
0.83 1
τ =0.6
−−→ ⎣ 0 ⎦
⎣ 0.55 ⎦ −
0.35 0
236 13 Modeling Compositional Design
which we would not know how to interpret as part of a noun. Nevertheless, despite
these obvious caveats, the procedure above learns how to generate all acceptable three
letter nouns given an initial start letter. There will be at most P2 possible second and
third letters in this set. Since this set of possibilities will grow rapidly, after generating
the letter two set of possibilities, we randomly choose one of the columns of the letter
two matrix as the second letter choice and apply the h12 mapping to that letter. We
then randomly choose one of the columns of the letter three matrix as the third letter
choice.
We then extend this procedure to the generation of all P letters with concatenation
hP−2,P−1 · · · h12 h01 which we will denote by the symbol H n , where the superscript
indicates this is the mapping we will use for noun sequences. The mapping H n is
the Noun Generating Map or NGM that we seek. It generates a set of PP−1 letter
two to letter P sequences. By making a random column choice at each letter, we
generate one random P letter noun sequence for each initial letter we use. We can
do something similar for the verb and object data generating the Verb and Object
Generating Maps H v and H o , respectively. These three mappings are the noun, verb
and object generator mappings we were seeking. Then we need to connect nouns to
verbs and verbs to objects. The mapping from N to V is where the real processing lies.
The Würfelspiel matrix training data approach tells us that it is permissible for certain
nouns to be linked to certain verbs. While we could memorize a look-up table based
on this data, that is not what we wish to do. We want to determine underlying rules
behind these associations as emergent behavior in a complex system of interacting
agents. Thus, each output noun Ni in the the collection of P nouns {Ni : 0 ≤ i < P}
should activate any of the P verbs {Vj : 0 ≤ j < P} via the map gN V . Further, each
verb Vj in {Vj : 0 ≤ j < P} should activate the output nouns {Oi : 0 ≤ i < P} by the
action of the map gV O . To build the mappings gN V and gV O , we will eventually use
a more sophisticated model which is graph based with feedback which is another
discussion entirely.
There are then two ways to create a valid emotionally labeled music or painting
composition. The first does not use the mappings gN V and gV O . For a painting, a
random choice of background letter is chosen to begin the sentence selection process.
This input generates a valid background image. Then, a randomly chosen starting
letter for the midground is then used to generate a valid midground image. Finally,
a random start letter for the foreground generates the last foreground image from
13.2 Connectionist Based Compositional Design 237
which the painting is constructed. The output of the cognitive module is thus a short
sentence, that is painting, of the type we have discussed. The second method is more
interesting. The randomly generated noun N generates a valid verb gN V (N) and the
valid object is generated by the concatenation gV O (gN V (N)). Thus, the composite
map gV O gN V provides a sentence generator.
Once we can generate sentences, we note that we can move to the generation
of streams consisting of sentences concatenated to other sentences after we create
an Object to Noun mapping. This is done in a way that is similar to what we have
done before using a Würfelspiel array approach. Other possibilities then come to
mind; for example, in this painting context, such transitions can create arbitrarily
long visual streams we can call animations. The analog of key changes in music is
then perhaps visual scene changes. Since key changes are logical transitions, we can
create arbitrarily long musical streams punctuated by appropriate key changes by
using the Würfelspiel array approach to model which key changes between given
key signatures are pleasing between two musical streams. Note, this construction
process has all the problems of a traditional connectionist approach. The gN V , gV O
and gNO maps depend on our data and would have to be rebuilt when new data is
used. We believe it will be much better to replace this connectionist approach with a
biologically based simple brain model.
We can see that in principle, there is a lot that can be accomplished by building
the maps we have discussed above using any architectures we desire. However, we
would like our connectionist architecture choices to be based on what we know of
neurobiology in either humans or other creatures with interesting neural systems
such as spiders, honeybees and cephalopods.
and the composer’s duty was to make sure that any opening, transitional and closing
(or noun, verb and object) was both viable and pleasing for the musical style that
the composer was attempting to achieve. Thus, a musical stream could be formed
by concatenating these fragments together: picking the ith Opening (Oi ) , the jth
Transition (Tj ) and the kth Closing (Ck ) phrases would form a musical sentence.
Note that there are P3 possible musical sentences that can be formed in this manner.
If each opening, transition and closing fragment is four beats long, we can build P3
different twelve beat sentences.
⎡ ⎤ ⎡ ⎤
O0 T0 C0 B0 M0 F0
⎢ O1 T1 C1 ⎥ ⎢ B1 M1 F1 ⎥
⎢ ⎥ ⎢ ⎥
M = ⎢. . . ⎥ P = ⎢. .. .. ⎥ (13.3)
⎣.. .
. .
. ⎦ ⎣. . . . ⎦
OP−1 TP−1 CP−1 BP−1 MP−1 FP−1
Further, a simple painting can also be considered as a different sort of triple; it consists
of a background (B), a midground (M) painted on top of the background and finally,
a foreground (F) layer which further occludes additional portions of the combined
background and midground layers. Hence, paintings could be organized as a matrix
P as shown in Eq. 13.3 also. Thus, a painting could be formed by concatenating these
images together: picking the ith background (Bi ), the jth midground (Mj ) and the kth
foreground (Fk ) phrases would form a painting Bi Mj Fk . Note that there are again P3
possible paintings that can be formed. A sample music matrix is shown in Fig. 11.13
and a painting matrix in Fig. 12.12.
Clearly, each data set encoded into a Würfelspiel matrix therefore contains crucial
information about equally valid examples of data that are solutions to a certain design
problem. The existence of this data means that mappings exist that tell us which
sequences of choices are valid and which are not from the perspective of the expert
who has designed the examples. Each data set therefore has an associated alphabet
with which we can express noun, verb and object units. For our purposes, let’s say
that each of the nouns, verbs or objects is a finite list of actions from an alphabet
of R symbols, where the meaning of the symbols is of course highly dependent on
the context of our example. In a simple music example, the list of actions might be
a sequence of four beats and the alphabet could be the choices from the C major
scale. Thus, we will assume each of the P nouns consists of a list of length L from
an alphabet of R symbols. Since the general alphabet has R symbols, each element
of a noun can be thought of as a vector of size R whose components are all 0 except
for a single 1. Let the letters in the alphabet be denoted by a0 through aR−1 . Then
a noun has components 0 to L − 1 where component n[j] is the letter ajn . The letter
ajn is then encoded into a vector of length R all of whose entries are 0 except for the
entry in slot j.
13.3 Neurobiologically Based Compositional Design 239
Consider a typical multiple column portion of our cortex model, say Area 17, 18 and
19. We could think of it as shown abstractly in Fig. 13.7. In this picture, we show a
stack of three cortical columns. At the top of the figure, we show the neurotransmitters
input from specialized midbrain areas. In our model, we are limiting our attention
to serotonin from the Raphe nuclei, norepinephrine from the Locus Coeruleus and
dopamine from the dopamine producing neurons. At the bottom, we indicate that
information flow through the cortical columns the thalamic control circuitry.
The situation is of course much more complicated and you must remember that
the arrows designated as OCOS and FFP actually refer to abstractions of specialized
groupings of neurons as shown in earlier figures as shown in Sect. 5.4.1. Let’s assume
that we have obtained auditory and visual cortex data (for example, the musical and
painting data matrices M and P previously discussed) which are encoded into feature
vectors of our design using an abstract grammar. In our own physiology, raw sensory
input is also coded into progressively more abstract versions of the actual signals.
Hence, we view our chosen algorithms for computing the feature vector of the given
data as the analog of that process. These feature vectors are then used as the inputs
into layer four of the bottom cortical column corresponding to each sensory modality.
The powerful OCOS and FFP circuits then serve as devices that do feature grouping.
The outputs of associative cortex areas such as area 20, 21, 22 and 37 of the temporal
cortex would then be constrained to match the known outputs due to our data.
To build the information processing models we are discussing thus requires that we
understand the input to output maps that are based on the Würfelspiel data generated
240 13 Modeling Compositional Design
for our given sensory modalities. As discussed above, this data will be used to train
the auditory and visual cortex in preparation for the training of the associative cortex.
The trained model of the associative cortex provides use with a neurobiologically
motivated model of sensor fusion. To understand this process better, we consider the
details of how a model of sentence construction for a given abstract grammar works.
This show quite clearly the perceptual grouping tasks that must be accomplished with
the OCOS and FFP training algorithms applied to the cortical columns of Fig. 5.24.
Recall, the raw data n, v and o is encoded into feature vectors N, V and O, respectively.
The preprocessing is carried out by mappings fn , fv and fo , respectively. Now, a
primitive object in our compositional grammar has length L. Our mapping problem
is thus to determine the rule behind the mapping from the noun feature vectors N to
the verb feature vectors V , gN V , and from the verb feature vectors V to the object
feature vectors O, gV O . We can express this mathematically by Eq. 13.1. To generate
a noun sequence ni equal to {ai0 n
, . . . , ai,L−1
n
} means we choose a random start letter
and then from that preferred sequences are generated while non-interesting words are
biased against. Hence, we think of a mapping, the Noun Generating Map or NGM
as accepting an letter input, a0n and generating a preferred noun. Note if we start
with a0n , we then need to generate a preferred second letter, a1n . Then a1n is used as
an input to generate a preferred third note, a2n and so on until the full string of letters
is finished. The verb and object data provide us with samples of a verb and object
generating map, VGM and OGM also. To model this mapping, we start by using the
information about useful noun strings we have.
Since we know which letters are preferred as the next letter given a starting choice,
we know that we have at most P first letter choices and each of these will have at most
R − P unacceptable second letters. Let T and T denote the set of all acceptable and
unacceptable outputs respectively. As we noted earlier, we could model this mapping
as a chained feed forward network, with feedforward and feedbackward connections
between artificial neurons using standard training techniques. This mapping would
take the first letter of a noun and output a set of acceptable second letters. However,
we would need to build such a mapping for each of the letter to letter transitions and
this is certainly computationally messy and even if it is workable for the nouns, it
is still not at all like the circuitry we see in Fig. 5.24. Hence, we will instead choose
to think of this mapping as a cortical column circuit whose outputs are produced
by OCOS and FFP interactions as described in Grossberg (2003). That is, the noun
inputs aijn go into layer 4 of the bottom cortical column of Fig. 5.24 and the OCOS
and FFP circuits then allow us to adjust the strengths of connections between the
neurons in the column so that we obtain the groupings of letters that are preferred in
the nouns of our sample. For convenience of exposition, we note the cortical column
for nouns can be labeled using the three six layer cortical groups, G n0 , G n1 and G n2 .
Further, let G ni denote the th in layer G ni . Then, we have the training sequence
where each arrow denotes perceptual grouping training using OCOS and FFP laminar
circuits. The output from layer 2 of the top cortical group of the noun column thus
13.3 Neurobiologically Based Compositional Design 241
consists of the preferred letter orders for nouns. We handle the verb and object data
for groups G vi and G oj in a similar way using the training sequences
To generate the preferred Noun to Verb and Verb to Object mappings, we use two
more cortical columns with the groupings G N V i and G V Oj . We then use the training
sequences
G 2n2 , G 2v2 → G 4N V 0 → G 2N V 0 → G 4N V 1 → G 2N V 1 → G 4N V 2 → G 2N V 2
G 2V 2 , G 2o2 → G 4V O0 → G 2V O0 → G 4V O1 → G 2V O1 → G 4V O2 → G 2V O2
The output from Noun–Verb cortical column is a preferred verb for a given noun and
the output from the Verb–Object cortical column is a preferred object for a given
verb. We thus have five cortical column stacks whose bottom column handles input
sentence parts and outputs a grammatically correct following part of the growing
sentence. At this point, we have the noun, verb and object generating maps, NGM,
VGM and OGM, that we have sought, as well as instantiations of Noun to Verb and
Verb to Object mappings.
A given set of sensory data encoded into a Würfelspiel matrix using a rudimentary
grammar based on an abstraction of the signal can be used to imprint a five cortical
column model of isocortex. Let’s assume we have data from two sensory modalities,
A and B. We posit an information processing architecture which consists of the four
mutually interacting agents: isocortex imprinted by sensory data of type A, CA;
isocortex imprinted by sensory data of type B, CB; an isocortex Area 37 module,
A37, to be imprinted by the expected outcome due to the fusing of information
from CA and CB; and an isocortex limbic module, LM, whose purpose is to provide
modulatory influence to the inputs of the other three modules. Using notation similar
to what we have used previously, the training sequence, suppressing internal details,
is shown in Eq. (13.5).
We can thus construct mappings that encode the compositional designs from our
painting and music data. This suggests some basic principles: if we use the super-
scripts α and β to denote emotional modality and data type, respectively, we can label
αβ αβ
the mappings in the form {gN V , gV O }. We let α = 0, 1, 2, 3 denote neutral, happy, sad
and angry and β = 0, 1 indicate the choice of music and painting designs, respec-
tively. We thus have a collection of mappings
{gN00V , gV00O } {gN01V , gV01O } {gN02V , gV02O } {gN03V , gV03O } Music
{gN10V , gV10O } {gN11V , gV11O } {gN12V , gV12O } {gN13V , gV13O } Painting
The high level cortex processing is shown in Figs. 13.8 and 13.9. A simplified version
of the auditory and visual cortex to associative cortex processing is illustrated. Only
some of the interconnections are shown. Each cortex type is modeled as four layers
which are associated with the specific time frames of milliseconds (10−3 s), seconds,
days (103 s) and weeks (106 s). Each time frame connects feedforward and feedback
to all time frames with longer time constants (i.e. the millisecond time frame outputs
project to the second, day and week time frames). The associative cortex processing
is assumed to be output via the four time frames of the temporal cortex. We show
very few of these connections as the diagram quickly becomes difficult to read.
However, the frontal, parieto-occipital and temporal cortex modules are all fully
interconnected. The portions of these diagrams labeled as frames of specific time
Fig. 13.8 A simplified version of the auditory cortex to associative cortex processing. Only some
of the interconnections are shown. Each cortex type is modeled as four layers which are associated
with specific time frames. The different processing time frames are labeled −3, 0, 3, and 6 for
millisecond, seconds, days and weeks. The temporal cortex time frames shown connect to limbic
system represented by the cingulate gyrus
13.4 Integration of the Models 243
Fig. 13.9 A simplified version of the visual cortex to associative cortex processing. Only some of
the interconnections are shown. Each cortex type is modeled as four layers which are associated
with specific time frames. The different processing time frames are labeled −3, 0, 3, and 6 for
millisecond, seconds, days and weeks. The temporal cortex time frames shown connect to limbic
system represented by the cingulate gyrus
abstract processing is performed. Hence, our data provides a validating pathway for
two types of primary sensory cortex as well as a primitive model of higher level
associative cortex.
We can use the developed models to model forms of depression following the
monoamine—neurokinin and BDNF hypothesis of dysfunction. In the monoamine
hypothesis, we assume that depression is due to a deficiency of the monoamine
neurotransmitters norepinephrine NE and serotonin 5HT. Since we are dealing with
monoamine NTs, all of the machinery needed to construct new NT on demand, break
it down into components for reuse and transport it back from the cleft into the pre-
synapse via re-uptake pumps is available in the pre-synaptic cell. The enzyme used
to break down the monoamine is called monoamine oxidase or MAO. In a malfunc-
tioning system, there may be too little monoamine NT, too few receptors for it on the
post-synapse side, improper breakdown of the NT and/or improper re-uptake. If MAO
inhibitors are given, the breakdown is prevented, effectively increasing monoamine
concentration. If tricyclic compounds are given, the re-uptake pump is blocked, also
effectively increasing the monoamine concentration.
Another hypothesis for the etiology of depression is that there is a malfunction
in one or more second messenger systems which initiate intracellular transcription
factors that control critical gene regulation. A candidate for this type of malfunction
are the pathways that initiate activation of brain derived neurotrophic factor or BDNF.
As discussed in Stahl (2000), BDNF is critical to the normal health of brain neurons
but under stress, the transcription of the protein from the gene for BDNF is somehow
decreased leading to possible cell death of neurons. The administering of MAO
inhibitors and tricyclics then reverse this by effectively stimulating the production
of BDNF.
Finally, we know that serotonin and a peptide called substance P of a class of
peptides known as neurokinins are co-transmitters. There is some evidence that if
an antagonist to substance P is administered, it functions as an antidepressant. The
amygdala has both monoamine and substance P receptors and hence modulation of
the monoamine pre-synaptic terminal by substance P is a real possibility.
Our simple pharmacological model will allow us to model the critical portions of
these depression hypotheses in our abstract cells which can then be linked together
into larger computational ensembles.
The full cognitive model is built as an interacting collection of the intelligent
software agents encompassing auditory, visual, frontal, limbic cortex agents, sero-
tonin and noradrenergic pathway agents, environmental input agents for music,
painting and generic compositional design agents and a emotional amygdala agent.
The connections between the agents can be modified by pharmacological agents
at the critical time scales of 1−10 ms (immediate), 10−1000 ms (fast onset) and
13.5 Depression Models 245
The final step is to integrate the model into a three dimensional (3D) virtual world.
This is done by using the motor and thalamus outputs of our model to influence the
movement and texture mappings of a 3D character. Characters or avatars are gener-
ated in such worlds using computer code which constructs their body as a 3D mesh of
polygons given a 3D position. Then 2D matrices of a texture are applied to portions
of the 3D mesh by warping the texture to fit the 3D geometry of the character. This
is called the texture mapping phase and it is only then that the character takes on
a realistic appearance. In our model, the fMRI and skin conductance scalars output
from the thalamus serve to locate a position in 2D emotional space as determined
by psychophysiological experiments (see Chap. 10) which may be a more complex
emotional state than that of sadness, happiness or anger. We can assign different
texture mappings to different emotional vectors and thereby influence the physical
appearance of the avatar based on emotional input. In addition, we can assign dif-
ferent categories of motor movements to emotional vectors in order to have distinct
motor patterns that are associated with emotional states. There is much work to be
done to work out all the details of these mappings, of course. Some of this detail
is shown in Fig. 13.10. Finally, we note that this simple model will be used as the
foundation upon which we will assemble our models of cognitive dysfunction. The
diagram of Fig. 13.3 clearly shows that there are neurotransmitter inputs into the lim-
bic system and cortex which can influence the output that reaches the avatar. Indeed,
they influence the outputs we call meta level in music and painting compositional
design. There are established connections between neurotransmitter levels in cortex
associated with cognitive dysfunctions. Also, there is a body of literature on the
musical and painting outputs of cognitively disturbed patients which can therefore
be associated with the neurotransmitter levels. This enables us to develop another
set of training data that is similar in spirit to what we have already done with the
musical and painting data. Once our models are constrained by this new data, we
have models that generate outputs of music and painting in a variety of emotional
modalities. The alterations in neurotransmitter levels can be obtained by specific
lesions in our software models, thereby allowing us to do controlled lesion studies
which we believe will be of therapeutic use in the study and treatment of cognitive
dysfunction. The resulting cognitive models are then attached to the software that
generates the attributes of a character that moves autonomously in a 3D virtual world
Our high level graphical interfaces thus realize our models as 3D artificial avatars
in virtual worlds and hence can interact with both other avatars and the user of the
software. We will use these tools to develop models of cognitive dysfunction such
as depression for possible use in diagnostic/therapy efforts in collaboration with
246
cognitive scientists. We can envision that in the future, we can develop interfaces
allowing signals obtained from multielectrode electrophysiological probes to drive
the avatars as well.
Using our proposed tools, we can thus build autonomous art, music and general
design agents that are coupled to avatars that live in appropriate 3D worlds that
are not pastiche based but instead are based on emerging structure. This approach
thus has significant broad scientific impact. In addition, the utilization of music and
design professionals as our first round of clinicians to help shape the data sets used to
shape our cognitive models is innovative and will provide an infrastructure for future
involvement of such personnel in the even more important planned lesion studies.
If our software model of human cognition is capable of producing valid emotional
responses to ordinary stimuli from music and art, then it will provide a powerful tool
to study cognitive dysfunction. If such a model could be built, it would be possible to
introduce errors in many ways into the architecture. The output from the model could
then be assessed and we would in principle have an experimentally viable mechanism
for lesion studies of human cognition which is repeatable and fine grained. A few
such experimental protocols could be
• Alter how messages are sent or received between abstract neurons by changes in
the pharmacological inputs they receive (enhancement or blocking of neurotrans-
mitters via drugs at some stage in the agonist spectrum),
• Alter the physical connection strategies between the fibers in the ensembles to
model physiological wiring issues between cognitive submodules,
• Alter how messages are sent or received between ensembles to model high level
effects of neurotransmitter or other signaling molecule problems,
• Since each software agent could itself be a collection of interacting finer-grained
agents, we could introduce intracellular or extracellular signaling errors of many
types inside a given agent to study communication breakdown issues.
We believe that the capability for lesion studies that are software based could be
of great value. If we can build reasonable software models of various aspects of
human cognition, then we can also alter those models in ways dictated to us by
experimental evidence and can provide insight into aspects of cognitive dysfunc-
tion. For example, we can couple our models to mid level data (functional MRI and
gross anatomical structure) and low level data (neurochemical and neurotransmitter
changes and microstructure deviations). The resulting models will be therefore faith-
ful to data encompassing multiple scales and will be an important first step toward
abstract models that are useful in artificial (and hence controllable) lesion studies of
human cognition. Further, by altering the cognitive model in a particular lesion study,
the functioning autonomous music and painting composer can generate musical
248 13 Modeling Compositional Design
The model we have generated so far creates valid emotionally labeled musical com-
positions and paintings. Some design principles about this construction process can
now be gleaned from this work for the generation of musical and painting composi-
tional designs. In general, our model takes specialized sensory input and generates
a high level output as shown in Fig. 13.1. We can design algorithms to train our cor-
tical tissue models using laminar cortical processing via on-surround excitation/off-
surround inhibition and Hebbian based connection strengthening. Our discussion so
far shows us that the full computational model can be described by the diagram of
Fig. 13.4. As discussed for emotional data in the context of music, given a random
starting note say from column one of a Würfelspiel music matrix of given emotional
modality, our model will generate an entire valid musical composition. Note that this
output should actually be interpreted as two separate pieces of information: one, as a
musical composition and two, as an emotional state. Our data thus provides training
for the correct output of a model of emotions as well as a total of 128 happy, sad
and angry input/output training samples. We can thus generate valid compositional
designs that have a specific emotional tag for both the auditory, visual and associative
cortex pathways.
We will start with an emotional model which outputs two parameters (loosely
based on the psychophysiological data experiments). The first is actually a skin con-
ductance parameter and the second is a complicated computed value that arises from
certain fMRI measurements. The interesting thing about these values is that in exper-
iments with human subjects, when people saw pictures with emotional contents such
13.8 The Complete Cognitive Model 249
as “sad”, “angry” and so forth, the two parameters mentioned above determined a x-y
coordinate system in which different emotional attributes were placed experimentally
in very different parts of this plane. For example, “sad” images might go to quadrant
2 and “angry” images might be mapped to quadrant 3. This is an over simplification
of course, but the idea that images of different emotional attributes would be separate
in the plane is powerful. Our 128 examples of each emotional attribute thus give us
128 data points which should all be mapped to the same decision region of this two
dimensional plane. Thus, we have data that unambiguously gives us a desired two
dimensional output for our emotional model. At this point, we have a model that
given an abstract auditory and visual input stream from the data will generate given
emotional attributes. We can then turn the system around if you like, by noting that
each data set corresponds to a certain decision region in “emotional space”. Fur-
ther, recognize that we have a coupled model of sensory processing and emotional
computation. We have an auditory and visual agent which given input from a known
emotion decision region, generates a musical and artistic stream of a given emotional
attribute. We model this as a three software agent construction: auditory, visual and
emotional. Each of these agents accepts inputs from the others. We have enough data
to develop a first pass at a model which given a two dimensional emotional input
and a visual and auditory random start, will generate an emotional tagged auditory
and visual stream. This model is shown in Fig. 13.5 for music and painting. We can
then validate this model easily by just letting anyone listen or look at our output and
tell us if it is good. Hence, we are validating the whole model instead of just the
equivalent of a local patch of hippocampal tissue in a slice. Indeed, it should also be
possible to include validation in the learning algorithm.
temporal cortex. The painting data has then provided a similar kind of associated input
into area 37 from the occipital cortex. Inputs that create the paintings correspond to
the visual sensory inputs into area 17 of the occipital cortex which are then further
processed by area 18 and 19 before being sent to the temporal cortex for additional
higher level processing. The musical and painting data inputs correspond to specific
fMRI and skin conductance outputs in addition to an encoded compositional design.
Our data thus allows us to build the model shown in Fig. 13.4.
At this point we have a simple model of the thalamic outputs as shown in Fig. 5.3.
These thalamic outputs can then be integrated into a larger model which can be
part of the character behavior modules in a three dimensional (3D) virtual world. In
effect, this is done by using the motor and thalamus outputs of the model to influence
the movement and texture mappings of a 3D character. Characters or avatars are
generated in such worlds using computer code which constructs their body as a 3D
mesh of polygons given a 3D position. Then 2D matrices of a texture are applied
to portions of the 3D mesh by warping the texture to fit the 3D geometry of the
character. In our model, the fMRI and skin conductance scalars output from the
thalamus thus serve to locate a position in 2D emotional space as determined by
psychophysiological experiments which may be a more complex emotional state
than that of sadness, happiness or anger. We can assign different texture mappings
to different emotional vectors and thereby influence the physical appearance of the
avatar based on emotional input. In addition, we can assign different categories of
motor movements to emotional vectors in order to have distinct motor patterns that
are associated with emotional states. Some of this detail is shown in Fig. 13.10.
We will return to these ideas in Chap. 21 where we outline a much better method
for the training of graph based neural models. But first, we need to discuss in much
detail how we would build mappings from inputs to outputs for the complicated
neural models we have been talking about. Before we get into these details, we will
talk a bit about network models of neurons in general in Chap. 14 and start going
through some training ideas in Chap. 15.
References
We know that inputs into the dendritic tree of an excitable nerve cell can be separated
into first and second messenger classes. The first messenger group consists of
Hodgkin–Huxley voltage dependent ion gates and the second messengers includes
molecules which bind to the dendrite through some sort of interface and then trigger
a series of secondary events inside the cytoplasm of the cell body. We will primarily
be interested in second messengers which are neurotransmitters. We need to have a
more detailed discussion of the full dendrite—soma—axon system so we can know
how to connect one excitable nerve cell to another and build networks. We will have a
more abstract look at second messenger systems later, but for now we will be focused
on a class of neurotransmitters that include dopamine.
H H OH H
C3 C2 C3 C2
H C4 C1 H OH C4 C1 H
C5 C6 C5 C6
H H H H
OH C4 C1 C C H
C5 C6
H H
H H
Catechol Molecule With
Ethyl Chain on C1 .
H H H
C3 C2
OH C4 C1 C C N
C5 C6
H H H
H H
Dopamine
the benzene ring. We can then add an ammonia molecule NH3 (amine group) to the
outermost carbon in the ethyl side chain, the amine losing one hydrogen in order to
bond and thus appearing on the chain as NH2 : This is the 3, 4-dihydroxy derivative
of the phenylethylamine molecule which is the neurotransmitter Dopamine (DA)
(see Fig. 14.3). Further derivatives of this then give us Norepinephrine (NE) and
Epinephrine (E). If the beta carbon on the side chain replaces one hydrogen with
a hydroxyl group by hydrolysis we obtain NE (see Fig. 14.4). If the hydrogen on
the second carbon of the ethyl side chain by a methyl ion, we get E (see Fig. 14.5).
Note that all of these molecules are monoamines—“one amine” constructions and
14.1 The Basic Neurotransmitters 253
OH H H
C3 C2
OH C4 C1 C C N
C5 C6
H H H
H H
Norepinephrine
OH CH2 H
C3 C2
OH C4 C1 C C N
C5 C6
H H H
H H
Epinephrine
all are derivatives of the catechol group. Without going into detail, DA, NE and E are
constructed from the nutrient soup inside our cells via the following pathway (see
Fig. 14.6):
1. Tyrosine hydroxylase (adds a OH group) converts Tyrosine into L-DOPA. This
is called the rate limiting substance in CA synthesis because if it is not present,
the reaction will not continue. So there must be a mechanism to request from our
genome appropriate levels of Tyrosine synthesis when needed.
2. L-DOPA, formed by hydroxylation of tyrosine, is then converted into DA by
DOPA decarboxylase (take away a carboxyl group).
Tyrosine DOPA
DBH PnM
Hydroxylase Decarboxylase
Tyrosine L-DOPA DA NE E
Hence, in our search for a core or parent object whose instantiation is useful
in constructing software architectures capable of subserving learning and memory
function, the evidence presented above suggests we focus on prototypical neuro-
transmitters. It is clear a model of dendritic–axonal interaction where dendrite and
axon objects interact via some sort of intermediary agent. These agents need to look
at both dendritic and axonal specific information. Both the dendrite and the axon
should have some dependency on neurotransmitter objects whose construction and
subsequent reabsorption for further reuse should follow modulatory pathways that
mimic in principle the realities of CA transmitter synthesis. Further, there is a need
for Object Recognition Sites, as dendritic objects need receptor objects that interact
with the neurotransmitter objects. There should also be a finite number of types of
neurotransmitter objects and possibly different finite number of receptor object types.
The number and variety of these sites grow or shrink in accordance with learning
laws.
How should we model CA release, CA termination, CA identification via CA
specific receptors and synaptic plasticity? CA is released through a large variety of
14.2 Modeling Issues 255
that can be altered via second messenger triggers which gives rise to synaptic plas-
ticity. The appropriate software agent connecting dendritic and axonal objects is
therefore an object of type say PSD—a bridging object whose function and struc-
ture is alterable via objects of type hormone, neurotransmitters and perhaps others
through some sort of software mechanism.
We now discuss some of the issues that are involved in the implementation of a
software environment that will facilitate rapid prototyping of biologically motivated
models of cognition. Using ideas from a language such as C++, python or others,
would lead us to neural architectures as derived classes from a virtual class of neural
objects. The computational strategies that support functions such as training would
also be classes derived from a core primitive class of computational engines. We wish
to allow classes of architectures which are general enough to allow each computa-
tional element in an architecture to itself be a neural network of some type, a statistical
model and so forth. This also implies that we must develop more sophisticated mod-
els of connections between computational elements (synaptic links between classical
lumped sum neuronal models are one such type of connection). In addition, we wish
to have the capability in our modeling environment to adapt abstract principles of
neuronal processing that are more closely inspired by current neurobiological knowl-
edge. Consider this comment Black (1991, pp. xii–xiii):
Briefly, certain types of molecules in the nervous system occupy a unique functional
niche. These molecules subserve multiple functions simultaneously. It is well recognized that
biochemical transformations are the substance of cellular function and mediate cell interac-
tions in all tissues and organ systems. Particular subsets of molecules simultaneously serve
as intermediates in cellular biochemistry, function as intercellular signals, and respond in
characteristic modes to environmental stimuli. These critical molecules actually incorporate
environmental information into the cell and nervous system. Consequently these molecules
simultaneously function as biochemical intermediates and as symbols representing specific
environmental conditions....
The principle of multiple function implies that there is no clear distinction among the
processes of cellular metabolism, intercellular communication, and symbolic function in
the nervous system. Representation of information and communication are part of the func-
tioning fabric of the nervous system. This view also serves to collapse a number of other
traditional categorical distinctions. For example, the brain can no longer be regarded as the
hardware underlying the separate software of the mind. Scrutiny will indicate that these
categories are ill framed and that hardware and software are one in the nervous system.
From this first quote of Black, you can see the enormous challenge we face. We can
infer from the above that to be able to have the properties of plasticity and response
to environmental change, certain software objects must be allowed to subserve the
dual roles of communication and architecture modification. Black refers to this as
the “principle of polyfunction” (Black 1991, p. 3).
14.3 Software Implementation Thoughts 257
Shorn of all detail, the software–hardware dichotomy is artificial. As we shall see, software
and hardware are one and the same in the nervous system. To the degree that these terms
have any meaning in the nervous system, software changes the hardware upon which the
software is based. For example, experience changes the structure of neurons, changes the
signals that neurons send, changes the circuitry of the brain, and thereby changes output and
any analogue of neural software.
It seems clear from the above that our underlying software architecture must be
event-based in some fashion. If we think of architectures consisting of loosely coupled
computational modules, each module is a kind of input–output mapping which we can
call an object of type IOMAP. Models with computational nodes linked by edges are
called connectionist models. Hence, classical connectionist modeling asserts that the
information content of the architecture is stored in the weight values that are attached
to the links and these weight values are altered due to environmental influence via
some sort of updating mechanism. It is now clear the architecture itself must be
self-modifying. Now it is fairly easy to write add and delete type functions into
connectionist schemes (just think of the links and weights implemented as doubly
linked lists), but what is clear from the above that much more is needed. We need
mutable synaptic efficacy, the ability to add synapses and alter the function of neuron
bodies themselves. Consider the dual roles played by neural processing elements
(Black 1991, pp. 13–14):
[Weak reductionism] involves identification of scientifically tractable structural elements
that process information in the nervous system....Any set of elements is relevant only insofar
as it processes information and simultaneously participates in ongoing neural function; these
dual roles require the neural context.
258 14 Networks of Excitable Neurons
What structural elements may be usefully examined? It may be helpful to outline appropriate
general characteristics... First, neural elements of interest must change with environment.
That is, environmental stimuli must, in some sense, regulate the function of these particular
units such that the units actually serve to represent conditions of the real world. The potential
units, or elements of interest, thereby function as symbols representing external or
internal reality. The symbols, then, are actual physical structures that constitute neural
language representing the real world. Second, the symbols must govern the function of
the nervous system such that the representation itself constitutes a change in neural state.
Consequently symbols do not serve as indifferent repositories of information but govern
ongoing function of the nervous system. Symbols in the nervous system simultaneously
dictate the rules of operation of the system. In other words, the symbols are central to the
architecture of the system; architecture confers the properties that determine behavior of the
system. The syntax of symbol operation is the syntax of neural function.
We now have a potential blueprint for our software architectural design: we need
software objects that
1. Change with environmental conditions.
2. Their state is either a direct or indirect representation of external environmental
conditions.
3. Their change in state corresponds to a change in the underlying architecture.
Information processing also involves a combinatorial strategies Ira Black, (Black
1991, p. 17):
Two related strategies are employed by the nervous system in the manipulation of the
transmitter molecular symbols. First, individual neurons use multiple transmitter signal types
at any given time. Second, each transmitter type may respond independently of others to
environmental stimuli.... The neuron appears to use a combinatorial strategy, a simple and
elegant process used repeatedly in nature in a variety of guises. A series of distinct elements,
of relatively restricted number, are used in a wide variety of combinations and permutations.
For example, if a given neuron uses four different transmitters, and each can exist in only three
different concentrations (based on environmental influences) independently, the neuron can
exist in 34 discrete transmitter states....However, this example appears to vastly underestimate
combinatorial potential of the neuron in reality.
Let’s put together all of the insight we have gathered from our discussions. We know
that
1. The principle of multiple function implies that there is no clear distinction among
the processes of cellular metabolism, intercellular communication and symbolic
function in the nervous system. Hence our software infrastructure should pos-
sess the same blurring of responsibility.
2. As mentioned before, software and hardware are the same in the nervous system.
This implies that the basic building blocks of our software system should be able
to alter the software architecture itself. We could implement part of this capability
using the notion of dynamically bound computational objects and strategies. using
the C++ language. We could also do prototyping in Octave, python and others. If
we use the compiled C++ language, we will be forced to deal with the inadequacies
of the C++ language’s implementation of objects whose type is bound at run-time.
As we mentioned above, we will not use C++ at this time and instead focus on
Erlang.
3. Another basic principle from Black (1991) in the context of ongoing function
is that the very fact of communication, alters nervous system structure. This
implies that the structure of our software architecture be mutable in response to
the external input environment.
The evidence for these organizing principles can be summarized as follows:
1. We know that dendritic-axon coincident electrical activation strengthens synap-
tic efficacy through what are called Hebbian mechanisms. There is substantial
evidence of this: in the rat hippocampus, long term potentiation (LTP) leads to
structural modification of the synapse via the glutamate neurotransmitter. Clearly,
neurotransmitters which are known to convey millisecond to millisecond
excitatory information also regulate circuit architecture via synaptic struc-
tural change.
2. Macroscopic behavioral change is mirrored by distinct structural change at the
synaptic level. These points have been verified experimentally by the many excel-
lent investigations into the nervous system of the the sea snail Aplysia Californica.
In this animal, long term habituation and long term sensitization behaviors are
associated with definite structural changes in the number of synaptic vesicles, the
number of synaptic sites and so forth. In addition, these microscopic changes in
structure have the same temporal signature that the behavioral changes exhibit;
i.e. these changes last as long or as short as the corresponding behavior.
Neurotransmitters give the nervous system a capability to effect change on an extra-
ordinary range of time scales; consider the following simplified version of neuro-
transmitter interaction. We abstract out of the wealth of low-level biological detail
an abstract version of neurotransmitter pathways: start with an initial change in the
external environment E0 coming into a neuronal ensemble. This elicits a corre-
sponding change in impulse I0 from the ensemble which in turn triggers a change
260 14 Networks of Excitable Neurons
where the process terminates at step p because NTp − R ≈ 0. We see that the
interplay between creation of neurotransmitter and its reabsorption rate allows for
the effect of an environmental signal to have a temporal signature that lasts beyond
one time tick.
Our model must then clearly allow for nonlocal interaction and long term temporal
response to a given signal. From the above discussion, we also see that the basic
response of change in environment implies change in neurotransmitter level (E →
NT ) does not depend on what the signal type is or which neurotransmitter pathway
is activated. Evidence for this includes the following neurotransmitter-signal pairs
which all modulate response to a change in signal via the change in transmitter level:
1. Dopamine releasing neurons regulating motor function.
2. Norepinephrine releasing neurons regulating attention and arousal.
3. Adrenaline releasing neurons regulating stress response.
Here, we see clearly that cellular biochemical organization, not the type of environ-
mental signal or behavior, is the key to how external stimuli are converted into neural
language. We seek to construct a core group of software objects that have similar
function. Their organization and their methods of interaction will determine how
external stimuli are converted into our neural object language.
Axon Axon
Neurotransmitter Neurotransmitter
Post
Dendrite Receptor Synaptic Soma Axon
Density
Receptor
Dendrite
A dendrite–axon object pair interact via the PSD object which plays a role that is
similar to its biological function; however, we will also see that we will be able to
build extremely rich mathematical numerical processing objects as well using this
paradigm.
The output of the PSD object is sent to the computational body of the neuronal
object, the soma object. The details of how the soma will process the inputs it receives
will be discussed later. For now, we just note that some sort of information process-
ing takes place within this object. The output of the soma is collected into an axon
object containing the previously mentioned neurotransmitter objects. In this simpli-
fied illustration shown in Fig. 14.7, we do not try to indicate that there could be many
different types of neurotransmitters. We indicate the interaction pathways in Fig. 14.7
on the dendritic and axonal side of the soma. We provide further detail of the PSD
in the close up view given by Fig. 14.8. Here, we assume that the dendrite object
contains five types of neurotransmitters and the axon uses the associated receptor
objects for these neurotransmitters. The result of the dendritic–axonal interaction
is then computed by the PSD object via some as yet unspecified means. Now, how
should we handle the intricacies of the neurotransmitter–receptor interactions? We
do not want to model all of the relevant details. For our purposes, we will concentrate
on a few salient characteristics:
1. The probability of neurotransmitter efficacy will be denoted by the scalar para-
meter p. The variable p models neurotransmitter efficacy in a lumped parameter
manner. The amount of transmitter produced via the pathways that access the
genome, the probability of interaction with receptors and so forth are combined
into one number. More neurotransmitter is modeled by increasing p; increased
262 14 Networks of Excitable Neurons
p = ρ N ξ(R)
Let’s denote the PSD computation by the symbol •. We need to model how to obtain
the value of a prototypical interaction, I. The simplest case is that of a One Axon–One
Dendrite Interaction. We will let the term Aij denote a collection of things organized
using standard vector notion. We have
⎧ i
⎪
⎪ A [0] = v value attached to the axon.
⎨ j
Aj = Aij [1] = pσ the efficacy for an associated neurotransmitter σ
i
⎪
⎪
⎩ Ai [2] = q the reabsorption rate for an associated neurotransmitter σ
j σ
We use a similar notation of the dendrite, Dji . We allow for more fields in the dendrite
than the value field, so we show this as a dotted entry.
Dji [0] = v value attached to the dendrite.
Dji = ..
.
where Aij is the jth axon connection from neuron i. In this case, it connects to the
j
th dendrite of neuron j, D . Another situation occurs when the total efficacy and
reabsorption rate for a given axon are determined by efficacies and reabsorption
rates from a pool of surrounding axons also. However, there is still just one dendrite
involved. We can call this a One Axon (Local Domain)–One Dendrite interaction
whose interaction value is now given by
j
I= Aj [1] − Aj [2] Aij [0] • D [0]
k k
(14.2)
k∈Li (σ)
where the new symbol Li (σ) denotes the axonal locality of the neurotransmitter σ
in the ith neuron’s axon. It is easy to extend to a situation where in addition to the
pool of local axons that contribute to the efficacy and reabsorption rate of the axon,
we now add the possibility of local effects on the dendritic values. This is the One
Axon (Local Domain)–K Dendrites case and is the most complicated situation. The
interaction value is now
I= Aj [1] − Aj [2] Aij [0] • Dm [0]
k k
(14.3)
m∈Mj (σ) k∈Li (σ)
264 14 Networks of Excitable Neurons
where the new symbol Mj (σ) denotes the dendritic locality of the neurotransmit-
ter σ in the ith neuron’s axon. Even this simple model thus allows for long term
temporal effects of a given neurotransmitter by using appropriate values of p and q.
Indeed, this model exhibits the capability for imbuing our architectures with great
plasticity. For example, we can model the effects of T + 1 several different transmit-
ters {σ0 , σ1 , . . . , σT } at one PSD junction by using the notation (Akj )t and (Dm )t to
indicate which neurotransmitter we are looking at and then summing the interactions
as follows:
T
I= (Akj )t [1] − (Akj )t [2] (Aij )t [0] • (Dm )t [0] (14.4)
t=0 m∈Mj (σt ) k∈Li (σt )
Let’s stop and reflect on all of this. These complicated chains of calculation can be
avoided by moving to graphs of computational nodes which communicate with each
other by passing messages. The local and more global portions of these computations
are simply folded into a given node acting on its own input queue of messages using
protocols we set up for the processing. This is what we will be doing with chains of
neural objects.
We will discuss what is called a chained network in Chap. 17, but it is easy enough to
jump in a bit early to see how our neural computions could be handled. We can connect
a standard feedforward architecture of neurons as a chain as shown in Fig. 14.9: Note
we can also easily add self feedback loops and general feedback pathways as shown
in Fig. 14.10. Note that in Figs. 14.9 and 14.10, the interaction between dendritic and
axonal objects has not been made explicit. The usual simplistic approximation to
this interaction transforms a summed collection of weighted inputs via a saturating
transfer function with bounded output range. The dendritic and axonal interaction is
modeled by a scalar Wij as described above. It measures the intensity of the interaction
between the input neuron i and the output neuron j as a simple scalar weight. Since
we have developed the ballstick neural processing model, we will eventually try to
do better than this.
Let’s look at the Chained Feed Forward Network (CFFN) in more detail. In this
network model, all the information is propagated forward. Hence, Fig. 14.10 is
not of this type because it has a feedback connection as well as a self loop and
so is not strictly feedforward. The CFFN is quite well-known and our first expo-
sure to it occurred in the classic paper of Werbos (1987). Consider the function
14.5 Networks of Neural Objects 265
i0 O0
W05
W04
W03 W25 W40
W00 W02
W45
n0 n1 n2 n3 n4 n5
i0 O0
W05
W04
W03 W25 W40
W00 W02
W45
n0 n1 n2 W22 n3 n4 n5
W42
W13 W35 W51
W01 W14
i1 O1
A feedback and feedforward architecture draw as a chain
with two inputs and two outputs. Note the self feedback
interaction of W22 and the feedback strength of W42 .
Fig. 14.10 A chained neural architecture with self feedback loops and general feedback
H : nI → nO that has a very special nonlinear structure which consists of a chain
of computational elements, generally referred to as neurons since a very simple model
of neural processing models the action potential spike as a sigmoid function which
transitions rapidly from a binary 0 (no spike) to 1 (spike). This sigmoid is called
a transfer function and since it can not exceed 1 as 1 is a horizontal asymptote, it
is called a saturating transfer function also. This model is known as a lumped sum
model of post-synaptic potential; now that we know about the ball—stick model
of neural processing, we can see the lumped sum model is indeed simplistic. Each
neuron thus processes a summed collection of weighted inputs via a saturating trans-
266 14 Networks of Excitable Neurons
fer function with bounded output range (i.e. [0, 1]). The neurons whose outputs
connect to a given target or postsynaptic neuron are called presynaptic neurons.
Each presynaptic neuron has an output Y which is modified by the synaptic weight
Wpre,post connecting the presynaptic neuron to the postsynaptic neuron. This gives
a contribution Wpre,post Y to the input of the postsynaptic neuron. A typical satu-
rating transfer function model is shown in the Fig. 14.11a, b. Figure 14.11a shows
a postsynaptic neuron with four weighted inputs which are summed and fed into
the transfer function which then processes the input into a bounded scalar output.
Figure 14.11b illustrates more details of
the processing.
A typical transfer function
could be modeled as σ(x, o, g) = 0.5 1.0 + tanh x−o
φ(g)
with the usual transfer
function derivative given by
∂σ φ (g) x−o
(x, o, g) = sech2
∂x 2φ(g) φ(g)
where o denotes the offset indicated in the drawing and φ(g) is a function controlling
slope. The slope controlling parameter is usually called the gain of the transfer
function as it represents amplification of the incoming signal. Note that at the offset
φ (g)
point, ∂σ
∂x
= 2φ(g) ; hence, g effectively controls the slope of the most sensitive region
of the transfer functions domain. The function φ(g) is for convenience only; it is
awkward to allow the denominator of the transfer function model to be zero and to
change sign. A typical function to control the range of the gain parameter might be
φ(g) = gm + gM −g 2
m
(1.0 + tanh(g)) where gm and gM denote the lower and upper
saturation values of the gain parameter’s range. The chained model then consists of
a string of N neurons, labeled from 0 to N − 1. Some of these neurons can accept
external input and some have their outputs compared to external targets. We let
We will let nI and nO denote the size of U and V respectively. The remaining neurons
in the chain which have no external role are sometimes called hidden neurons with
dimension nH . Note in a chain, it is also possible for an input neuron to be an
output neuron; hence U and V need not be disjoint sets. The chain is thus divided
by function into three possibly overlapping types of processing elements: nI input
neurons, nO output neurons and nH internal or hidden neurons. In Fig. 14.12a, we see
a prototypical chain of eleven neurons. For clarity only a few synaptic links from pre
to post neurons are shown. We see three input neurons (neurons 0, 1 and 4) and four
output neurons (neurons 3, 7, 9 and 10). Note input neuron 0 feeds its output forward
to input neuron 1 in addition to feeding forward to other postsynaptic neurons. The
set of postsynaptic neurons for neuron 0 can be denoted by the symbol F(0) which
here is the set F(0) = {1, 2, 4}. Similarly, we see F(4) = {5, 6, 8, 9}.
We will let the set of postsynaptic neurons for neuron i be denoted by F(i), the
set of forward links for neuron i. Note also that each neuron can be viewed as a
postsynaptic neuron with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by
B(i). In our example, B(0) = {} and B(4) = {0}, where in general, the backward
link sets will be much richer in connections than these simple examples indicate.
For example, the chained architecture could easily instantiate what is called a local
expert model as shown in Fig. 14.12b. The weight of the synaptic link connecting
the presynaptic neuron i to the postsynaptic neuron j is denoted by Wi→j . For a
feedforward architecture, we will have j > i, however, as you can see in Fig. 14.10,
this is not true in more general chain architectures. The input of a typical postsynaptic
neuron therefore requires summing over the backward link set of the postsynaptic
neuron in the following way:
where the term x is the external input term which is only used if the post neuron is
an input neuron. This is illustrated in Fig. 14.13. We will use the following notation
(some of which has been previously defined) to describe the various elements of the
chained architecture.
The chain FFN then processes an arbitrary input vector x ∈ RnI via an iterative
process as shown below.
for(i = 0; i < N; i + +) {
if (i ∈ U)
yi = x i + j∈B(i) Tj→i Y j
else
yi = j∈B(i) Tj→i Y j
Y i = σ i (yi , oi , g i )
}
The output of the CFFN is therefore a vector in nO defined by H(x) = Y i | i ∈ V ;
that is,
⎡ ⎤
Y v0
⎢ .. ⎥
H(x) = ⎣ . ⎦
Y vnO −1
14.5 Networks of Neural Objects 269
and we see that H : nI → nO is a highly nonlinear function that is built out of
chains of nonlinearities. The parameters that control the value of H(x) are the link
values, the offsets and the gains for each neuron. Note computations are performed
on each nodes input queue!
The CFFN model consists of a string of N neurons. Since each neuron is a general
input/ output processing module, we can think of these neurons as type IOMAP.
The neurons are labeled from 0 to N − 1. A dendrite of any neuron can accept
external input and an axon of any neuron can have an external tap for comparison to
target information. In Fig. 14.10, we see a chain of six neurons. We see two neurons
that accept external input (neurons 0 and 1) and two neurons having external taps
(neurons 4 and 5). The set of postsynaptic neurons for a given neuron i is denoted
by the symbols F(i); the letter F denotes the feedforward links, of course. Note the
self and feedback connection in the set F(2). Also each neuron can be viewed as
a postsynaptic neuron with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by B(i).
In our example, these are the sets given below
where in general, the backward link sets can be much richer in connections than this
simple example indicates. This model thus uses the computational scheme
Y post = σ ypost , o, g = σ x + Wpre→post Y pre , o, g
pre∈B(post)
where a indicates the value of a given axon, d, the value of a given dendrite and S,
the soma computational engine, which depends on a vector of parameters, p. In the
simple sigmoidal transfer function used in this chain, the parameters are the gain,
offset and the minimum and maximum shaping parameters for the sigmoid; hence, the
parameter vector p would have values p[0] = o, p[1] = g, p[2] = gm and p[3] = gM .
270 14 Networks of Excitable Neurons
p14 p13 n1
n3
p35 p45 n4
n5 O1 O0
Finally, we let the symbol apre • d post indicate the computation performed by the PSD
between the pre and post neuronal elements. Note that the PSD • operation in the
chain model therefore corresponds to a simple multiplication of the pre-axon value
d and the post-dendrite value d.
The use of neurotransmitters in our axon models and receptors in our dendrite models
requires more discussion. The original Figs. 14.9 and 14.10 did not show the PSD
computational structures, while the larger illustration Fig. 14.14 does. This gives
some insight into our new modeling process, but it will be much more illuminating
to look at careful representations of this six neuron system in the case of two receptors
on all dendrites and two neurotransmitters on all possible axons. Note that this is just
for convenience and the number of neurotransmitters and receptors does not need to
be constant in this fashion. Further, the number of neurotransmitters and receptors
at each dendrite and axon need not match. Nevertheless, the illustrations presented
14.5 Networks of Neural Objects 271
D0
+ i0
NT0 R0 R1 NT0
NT1 NT1
a2
a1 NT0 NT1 a3
Fig. 14.15 The dendritic tree details for Neuron N0 showing two neurotransmitters per pre axon
connection to the dendrite
in Fig. 14.15 give a taste of the plasticity of this approach. We will use a common
notation for these illustrations given by
With all this said, the extended pictures of neuron 0 gives a very detailed look
at the network previously shown as Fig. 14.14. We could also specify axonal and
dendritic locality sets (although we do not illustrate such things in these pictures for
reason of excessive clutter), but we don’t really have to as long as we think of all
of this calculation as taking place in a given nodes input queue asynchronously. But
272 14 Networks of Excitable Neurons
we could setup this sort of careful linking as follows. We might specify the axonal
locality set for the axon 2 of neuron 1 for neurotransmitter t to consist of two axons:
axon 1 of neuron 1 and axon 2 of neuron 1 (itself). This information would then be
stored in an appropriate linking description, but for convenience of exposition, we
list it here in set form. The name ALt (i) denotes the axon link set for neuron i for
neurotransmitter t. So, this example would give AL(A12 )t = {A11 , A12 }. In a similar
way, we could define the dendritic locality set, DL for dendrite 1 of neuron 2 for
p
receptor r in set form (using the standard Aij and Dq notation): DL(D11 )r = {D02 , D12 }.
Look carefully at neuron 0. Figure 14.15 shows its full structure. It is the first neu-
ron in the chain and the computational capability of the neuron and its dendritic and
axonal tree is highly nonlinear due to the dependence on the two neurotransmitters
and two receptors present at each of the five PSD calculations. There are also time
delay possibilities that are controlled by gate opening and closings for the receptors.
The other neurons would have similar diagrams and we will not show them here.
The discussion above shows us how to define various types of neurons that all have
a similar structure. A neuron type is typically called a class. We will go back to our
ball stick model to organize our thoughts. A neuron class must compute the axon
hillock voltage, of course. This computation explicitly depends on
• L, the length of the cable,
• λc , the electrotonic length,
• The value of ρ = GGDS , the ratio of the dendritic conductance to soma conductance,
• The solution to the eigenvalue equation
tanh(L)
tan(αL) = − (αL),
ρL
on the soma cable. In the figure, the dendritic and soma cables of the presynaptic
neurons are not shown in detail and, for convenience, the cable length is shown as
LD = 4. We can now write down in organized form how computations will take place
in this network.
• At τ = 0, calculate the voltage, v, at the axon hillock via
QD LD NjD,0
j
vD (0) = V0,D,i Âjn cos[αn (LD − j)]
n=0 j=0 i=0
QS LS NkS,0
vS (0) = k
V0,S,i B̂nk cos[βn (LS − k)]
n=0 k=0 i=0
v(0) = vD (0) + vS (0)
where NjD,0 and NkS,0 denote the number of pulses arriving at node j and node k
of the dendrite and soma, respectively, at time τ = 0. We note the scalars such
as V0,D,i are actually the attenuated values V0,D,i e−(1+alphan ) where is a small
j j 2
amount of time—say .05 time constants. The voltage at the axon hillock is then
used to initialize the Hodgkin–Huxley action potential response as we discussed in
Peterson (2015). If an axon potential is generated, it can be sent back as a negative
impulse to the dendrites to model the refractory period of the neuron.
• Advance the time clock one unit.
• At τ = 1, at the axon hillock, the voltages applied at τ = 0 have now decayed to
274 14 Networks of Excitable Neurons
QD LD NjD,0
QS LS NkS,0
B̂nk cos[βn (LE − k)] e−(1+βn )
2
vS (0, 1) = k
V0,S,i
n=0 k=0 i=0
QD LD QjD,1
j
vD (1) = V1,D,i Âjn cos[αn (LD − j)]
n=0 j=0 i=0
QS LS QkS,1
vS (1) = k
V1,S,i B̂nk cos[βn (LE − k)]
n=0 k=0 i=0
z(0, 1) = vD (1) + vS (1)
and the total synaptic potential arriving at the axon hillock at time τ = 1 is thus
v(1) = (vD (1) + vD (0, 1)) + (vS (1) + vS (0, 1)) (14.7)
Again, the voltage at the axon hillock is then used to initialize the Hodgkin–Huxley
action potential response.
• At time τ = 2, we would find
QD LD QjD,0
QS LS QkS,0
B̂nk cos[βn (LE − k] e−2(1+βn )
2
vS (0, 2) = k
V0,S,i
n=0 k=0 i=0
QD LD QjD,1
QS LS QkS,1
B̂nk cos[βn (LE − k] e−(1+βn )
2
vS (1, 1) = k
V1,S,i
n=0 k=0 i=0
v(2) = (vD (2) + vD (1, 1) + vD (0, 2)) + (vS (2) + vS (1, 1) + vS (0, 2)) (14.8)
14.6 Neuron Calculations 275
where the notation vD (0, i) means the input voltage at time 0 after i time units of
attenuation. Hence, vS (1, 1) means the voltage that came in at time 1 1 time unit
later.
• Continuing in this fashion, at time τ = T , the total synaptic potential seen at the
axon hillock is
⎛ ⎞ ⎛ ⎞
T T
v(T ) = ⎝vD (T ) + vD (T − j, j)⎠ + ⎝vS (T ) + vS (T − j, j)⎠ (14.9)
j=0 j=0
For these calculations, we need to decide on an active time period. For example, this
active period here could be the five time ticks t0 to t4 . Hence, any incoming signals
into the cell prior to time t4 are ignored. Once, the time period is over, the cell can
possibly generate a new BFV.
Now step back again. All of this intense timing with its need to pay attention to
locality sets and so forth, can be removed by simply focusing on the chain architecture
and the backward and forward link sets of the graph of nodes. It is easy to implement
in a truly asynchronous way and since computation is on whatever is in a nodes input
queue, much of this messy notation is unnecessary! We will explore this approach
in later chapters.
276 14 Networks of Excitable Neurons
References
Let’s redo the OCOS graph as we did in Chap. 18. In Fig. 5.22a, there are 8 neural
nodes and 9 edges. We relabel the node numbers seen in the figure to match our
usual feedforward orientation. The OCOS DG is a graph G which is made up of the
vertices V and edges E given by
V = {N1 , N2 , N3 , N4 , N5 , N6 , N7 , N8 }
E = {E1 , E2 , E3 , E4 , E5 , E6 , E7 , E8 , E9 }
where you can see the nodes (we identify N1 in the graph model with N8 in the figure
and so forth). We will use the identifications
⎡ ⎤
E1 = edge from node N1 to N2 E2 = edge from node N2 to N3
⎢E3 = edge from node N2 to N4 E4 = edge from node N2 to N5 ⎥
⎢ ⎥
⎢E5 = edge from node N3 to N6 E6 = edge from node N4 to N7 ⎥
⎢ ⎥
⎣E7 = edge from node N5 to N8 E8 = edge from node N2 to N7 ⎦
E9 = edge from node N1 to N7
We then denote the OCOS DG by G(V, E). Recall, its incident matrix is then the
matrix K whose Kne entry is defined to be +1 if there is an edge going out of node
n; −1 if there is an edge going into node n and 0 otherwise. For the OCOS DG, K is
the 8 × 9 matrix given by
⎡ ⎤
1 0 0 0 0 0 0 0 1
⎢−1 1 1 1 0 0 0 1 0⎥
⎢ ⎥
⎢0 −1 0 0 1 0 0 0 0⎥
⎢ ⎥
⎢0 0 −1 0 0 1 0 0 0⎥
K=⎢
⎢0
⎥
⎢ 0 0 −1 0 0 1 0 0⎥⎥
⎢0 0 0 0 −1 0 0 0 0⎥
⎢ ⎥
⎣0 0 0 0 0 −1 0 −1 −1⎦
0 0 0 0 0 0 −1 0 0
Now assume that there is a vector function f which assigns a scalar value to each
node. Then, the gradient of f on the DG is defined to be ∇ f = K T f given by Eq. 15.1.
⎡ ⎤ ⎡ ⎤
f (N1) − f (N2) f 1 − f2
⎢f (N2) − f (N3)⎥ ⎢ f2 − f3 ⎥
⎢ ⎥ ⎢ ⎥
⎢f (N2) − f (N4)⎥ ⎢f 2 − f 4 ⎥
⎢ ⎥ ⎢ ⎥
⎢f (N2) − f (N5)⎥ ⎢ f2 − f5 ⎥
⎢ ⎥ ⎢ ⎥
∇f =⎢ ⎥ ⎢
⎢f (N3) − f (N6)⎥ = ⎢f3 − f6 ⎥
⎥ (15.1)
⎢f (N4) − f (N7)⎥ ⎢ f4 − f7 ⎥
⎢ ⎥ ⎢ ⎥
⎢f (N5) − f (N8)⎥ ⎢ f5 − f8 ⎥
⎢ ⎥ ⎢ ⎥
⎣f (N2) − f (N7)⎦ ⎣ f1 − f7 ⎦
f (N1) − f (N7) f2 − f7
where we identify f (N1) ≡ f1 and so forth. It is easy to see that this is a measure of
flow through the graph. We then define the Laplacian ∇ 2 f = KK T f for this DG as
shown in Eq. 15.2.
15.1 The OCOS DAG 279
⎡ ⎤
2f1 − f2 − f7
⎢−f1 + 5f2 − f3 − f4 − f5 − f7 ⎥
⎢ ⎥
⎢ −f2 + 2f3 − f6 ⎥
⎢ ⎥
⎢ −f2 + 2f4 − f7 ⎥
∇2 f = ⎢
⎢
⎥
⎥ (15.2)
⎢ −f2 + 2f5 − f8 ⎥
⎢ −f3 + f6 ⎥
⎢ ⎥
⎣ −f1 − f2 − f4 + 3f7 ⎦
−f5 + f8
Since KK T is symmetric, the eigenvalues of this Laplacian are positive and there is
nice eigenvector structure. The standard cable equation for the voltage v across a
membrane is
∂v
λ2 ∇ 2 v − τ − v = −r λ2 k
∂t
where the constant λ is the space constant, τ is the time constant and r is a geometry
independent constant. The variable k is an input source. We will assume the flow of
information through our OCOS DG is given by the graph based partial differential
equation
τOCOS ∂f 1
∇OCOS
2
f − − 2 f = −rk.
λ2OCOS ∂t λOCOS
Now, we relabel the fraction τOCOS /λ2OCOS by μ1 and the constant 1/λ2OCOS by μ2
where we drop the OCOS label as it is understood from context. Each computational
DG will have constants μ1 and μ2 associated with it and if it is important to distinguish
them, we can always add the labellings at that time. This equation gives us the
Laplacian graph based information flow model. This gives, using a finite difference
for the ∂f
∂t
term, Eq. 15.3, where we define the finite difference fn (t) as fn (t + 1) −
fn (t).
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2f1 − f2 − f7 f1 (t) f1 (t) k1 (t)
⎢−f1 + 5f2 − f3 − f4 − f5 − f7 ⎥ ⎢f2 (t)⎥ ⎢f2 (t)⎥ ⎢k2 (t)⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ −f + 2f − f ⎥ ⎢ f (t) ⎥ ⎢ f (t) ⎥ ⎢k3 (t)⎥
⎢ 2 3 6 ⎥ ⎢ 3 ⎥ ⎢3 ⎥ ⎢ ⎥
⎢ −f2 + 2f4 − f7 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ − μ1 ⎢f4 (t)⎥ = μ2 ⎢f4 (t)⎥ − r ⎢k4 (t)⎥
⎢ −f + 2f − f ⎥ ⎢ f (t) ⎥ ⎢ f (t) ⎥ ⎢ k (t) ⎥
⎢ 2 5 8 ⎥ ⎢ 5 ⎥ ⎢5 ⎥ ⎢ 5 ⎥
⎢ −f + f ⎥ ⎢ f (t) ⎥ ⎢ f (t) ⎥ ⎢ k (t) ⎥
⎢ 3 6 ⎥ ⎢ 6 ⎥ ⎢6 ⎥ ⎢ 6 ⎥
⎣ −f1 − f2 − f4 + 3f7 ⎦ ⎣f7 (t)⎦ ⎣f7 (t)⎦ ⎣k7 (t)⎦
−f5 + f8 f8 (t) f8 (t) k8 (t)
(15.3)
Then, if we assume that we want the node values f (N1), f (N2) and f (N3) to be
clamped to A, B and C respectively for a given input k, we find after substitution into
Eq. 15.3, we obtain Eq. 15.4.
280 15 Training the Model
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2A − B − f7 0 A k1 (t)
⎢−A + 5B − C − f4 − f5 − f7 ⎥ ⎢ 0 ⎥ ⎢ B ⎥ ⎢k2 (t)⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ −B + 2C − f6 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢k3 (t)⎥
⎢ ⎥ ⎢ 0 ⎥ ⎢ C ⎥ ⎢ ⎥
⎢ −B + 2f4 − f7 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ − μ1 ⎢f4 (t)⎥ = μ2 ⎢f4 (t)⎥ − r ⎢k4 (t)⎥
⎢ −B + 2f5 − f8 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢f5 (t)⎥ ⎢f5 (t)⎥ ⎢k5 (t)⎥
⎢ −C + f6 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢f6 (t)⎥ ⎢f6 (t)⎥ ⎢k6 (t)⎥
⎣ −A − B − f4 + 3f7 ⎦ ⎣f7 (t) ⎦ ⎣f7 (t) ⎦ ⎣ k7 (t)⎦
−f5 + f8 f8 (t) f8 (t) k8 (t)
(15.4)
This gives us an iterative equation to solve for f4 through f8 for each input k. We adjust
parameters in the OCOS DG using Hebbian ideas. We can perform a similar Lapla-
cian flow and Hebbian update strategy on any DG. In particular, you can work out
the details of this process applied to the DG of Fig. 5.22b and the Layer Six–Four
Connections to Layer Two–Three in Fig. 5.23a and the Combined OCOS/ FFP model
of Fig. 5.24.
15.2 Homework
Exercise 15.2.1 Develop the Laplacian Training method equations for the The
Folded Feedback Pathway DG as shown in Fig. 5.22b.
Exercise 15.2.2 Develop the Laplacian Training method equations for the Layer
Six–Four Connections to Layer Two–Three in Fig. 5.23a.
Exercise 15.2.3 Develop the Laplacian Training method equations for the Com-
bined OCOS/FFP model of Fig. 5.24.
Exercise 15.2.4 Develop the Laplacian Training method equations for the general
OCOS/FFP Cortical Model as shown in Fig. 5.24. Compare the eigenvalue and eigen-
vector structure of the component Combined OCOS/FFP models and Layer Six–Four
Connections to Layer Two–Three DAGs that comprise this model to the eigenvalues
and eigenvectors of the whole model.
a memory module at this time. We could test the ability of an architecture like this
to be autonomous within a virtual world. In this book, we program in the interpreted
language MatLab, but for models of this size we will need to recode in a language
such as C or C++. It is easy to see that organizing our graph architectures into
objects would be very useful. Hence, subsequent volumes, we introduce object ori-
ented programming so that we can rewrite our ideas into that kind of code. Our
design criteria at this stage are to see what kind of autonomy we can achieve in
our small brain models of 300 neurons or less. The power requirements for such an
architecture are reasonable for on board loading into an autonomous robot such as
those fielded for Mars exploration. In addition, we want our architectures to support
second messenger activity through a subset of neurotransmitter modulation using the
thalamus–midbrain–cerebellum loop. Hence, the small brain model will be useful for
software based lesion studies for depression and schizophrenia. To make it easier to
see how our models are performing, the first step builds an avatar which is a character
in a 3D virtual world such as we can build using Crystal Space, (Tybergheim 2003).
The avatar then interacts with the virtual world using commands generated in our
motor cortex module due to associative cortex processing of environmental inputs.
A reasonable visualization would require a 15 frame per second (fps) rate; hence,
the upper bound on computation time in our model is 1/15 s which is well within our
computational capabilities if we abstract from the bioinformation processing details
discussed in this volume simplified computational strategies.
We can assume the edge and node structure of our model is static—certainly,
plasticity is possible but we will ignore that for now. We have a model of how to
process information flow through the graph using the graph Laplacian based cable
equation. It is then clear that we have to decide how to process information flowing on
the edges and in the nodes themselves. Hence, each graph model will have associated
edge and node functions which can capture as little or as much of the underlying
biology as we want. The desired 15 fps rate will help us decide on our approximation
strategies. In the chapters to follow, we will finish this book with an abstraction
of second messenger systems, Chap. 6, and an approximate way to model the Ca++
current injections into the cytosol which form the background of a second messenger
trigger event, Chap. 7.
284 15 Training the Model
References
L. Garcia-Segura, Hormones and Brain Plasticity (Oxford University Press, Oxford, 2009)
J. Hellige, Hemispheric Asymmetry: What’s Right and What’s Left (Hardvard University Press,
Cambridge, 1993)
J. Kaas, T. Bullock (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 1: Theories, Development, Invertebrates) (Academic Press Elsevier, Amsterdam,
2007a)
J. Kaas, T. Bullock (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 2: Non-Mammalian Vertebrates) (Academic Press Elsevier, Amsterdam, 2007b)
J. Kaas, L. Krubitzer (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 3: Mammals) (Academic Press Elsevier, Amsterdam, 2007)
J. Kaas, T. Preuss, (eds.), Evolution of Nervous Systems: A Comprehensive Reference Editor
J. Kaas (Volume 4: Primates) (Academic Press Elsevier, Amsterdam, 2007)
R. Raizada, S. Grossberg, Towards a theory of the laminar architecture of cerebral cortex: Compu-
tational clues from the visual system. Cereb. Cortex 13, 100–113 (2003)
S. Murray Sherman, R. Guillery, Exploring The Thalamus and Its Role in Cortical Function (The
MIT Press, Cambridge, 2006)
G. Striedter, Principles of Brain Evolution (Sinauer Associates, Sunderland, 2004)
D. Stuss, R. Knight (eds.), Principles of Frontal Lobe Function (Oxford University Press, Oxford,
2002)
J. Tybergheim Crystal Space 3D Game Development Kit. (Open Source, 2003), http://www.
sourceforge.net
Part V
Simple Abstract Neurons
Chapter 16
Matrix Feed Forward Networks
We now begin to explore strategies for updating the graph models based on expe-
rience. Let’s begin with a very simple graph model called the Matrix Feedforward
Networks or MFFNs model. Instead of using graph based code, we do everything
using matrices and vectors. This graph model is based on information moving for-
ward through the network—hence, the adjective feed forward.
16.1 Introduction
F : R n 0 → R n M+1 (16.1)
that has a special nonlinear structure. The nonlinearities in the FFN are contained in
the neurons which are typically modeled by sigmoid functions,
σ(y) = 0.5 1 + tanh(y) , (16.2)
typically evaluated at
x −o
y= , (16.3)
g
where x is the input, o is the offset and g is the gain, respectively, of each neuron.
We can also write the transfer functions in a more general fashion as follows:
x −o
σ(x, o, g) = 0.5 1.0 + tanh
g
T0 TM
(16.4)
n 0 → n 1 → . . . → n M → n M+1
n 0 → n 1 → . . . → n M → n M+1 ; (16.5)
or even more simply, we can just refer to a n 0 n 1 · · · n M+1 MFFN. we will use either
one of these notations from this point on. The feed forward network processes an
arbitrary I ∈ R n 0 in the following way. First, for 0 ≤ ≤ M + 1 and 1 ≤ i ≤ n
and layer , we define some additional terms to help us organize our work. We let
Then, given x ∈ R n 0 , for 0 ≤ i ≤ M, we have the inputs are processed by the zeroth
layer as follows.
yi0 = xi (16.7)
X i0 = (yi0 − Oi0 )/gi0 (16.8)
Yi0 = σi0 (X i0 ) (16.9)
1 −1
n
yi1 = T ji0 Y j0 (16.10)
j=1
−1
n
yi+1 = T ji Y j (16.13)
j=1
n M+1 −1
yiM+1 = T jiM Y jM (16.16)
j=1
With this notation, for any I ∈ R n 0 , we know how to calculate the output from the
MFFN. It is clearly a process that flows forward from the input layer and it is very
nonlinear in general even though it is built out of relatively simple layers of nonlin-
earities. The parameters that control the value of F(I) are the n i n i+1 coefficients of
T i , 0 ≤ i ≤ M; the offsets O i , 0 ≤ i ≤ M + 1; and the gains g i , 0 ≤ i ≤ M + 1.
This gives the number of parameters,
M
M+1
N= n i n i+1 + 2 ni . (16.19)
i=0 i=0
Now let
I = I α ∈ R n0 : 0 ≤ α ≤ S − 1
and
D= Dα ∈ R n M+1 : 0 ≤ α ≤ S − 1
be two given sets of data of size S > 0. The set I is referred to as the set of exemplars
and the set D is the set of outputs that are associated with exemplars. Together, the
290 16 Matrix Feed Forward Networks
sets I and D comprise what is known as the training set. The training problem is
to choose the N network parameters, T 0 , . . . , T M , O 0 , . . . , O M+1 , g 0 , . . . , g M+1 ,
such that we minimize,
S−1
E = 0.5 F(I α ) − Dα 22 , (16.20)
α=0
M+1−1 2
S−1 n
= 0.5 YαiM+1 − Dαi ,
α=0 i=0
where the subscript notation α indicates that the terms defined by (16.6) correspond
to the αth exemplar in the sets I and D.
We will use the notation established in Sect. 16.1. If all the partial derivatives of
the feed forward energy function (16.20) with respect to the network parameters
were known, we could minimize (16.20) using a standard gradient descent scheme.
This is known as the MFFN Training Problem. Relabeling the N MFFN variables
temporarily as wi , 0 ≤ i ≤ N − 1, the optimization problem we face is
min E I, D, w
(16.21)
w∈R N
where we now explicitly indicate that the value of E depends on I, D and the
parameters w. In component form, the equations for gradient descent are then
∂ E
winew = wiold − λ (16.22)
∂wi wold
where λ is a scalar parameter that must be chosen efficiently in order for the gradient
descent method to work. The above technique is the heart of the back propagation
technique which is currently used in many cases to solve the MFFN training problem.
The MFFN back propagation equations will now be derived. For further information,
see the discussions in Rumelhart and McClelland (1986).
16.3 Partial Calculations for the MFFN 291
We need an algorithm to find the gradient of the MFFN error function so that we can
apply gradient descent.
First, we’ll look at the weight update equations for the weights between the last
hidden layer and the output layer. Using the chain rule, we have
S−1
M+1
∂E ∂E ∂ X βq
= (16.23)
∂T pq
M
∂ X βq
M+1 ∂T pq
M
β=0
n M+1 −1
∂E
S−1
∂YαM+1
j
= eαM+1
j ,
∂ X βq
M+1
α=0 j=0
∂ X βq
M+1
n M+1 −1
S−1
∂ X αM+1
j
= eαM+1
j σj
M+1 M+1
Xα j ,
α=0 j=0
∂ X βq
M+1
−1
S−1 n M+1
= eαM+1
j σ M+1
j X M+1
αj δβα δqj ,
α=0 j=0
= M+1
eβq σqM+1 M+1
X βq . (16.24)
∂E
where δβα is the standard kronecker delta function. Letting ξβq
M+1
= ∂ X βq
M+1 we
have,
ξβq
M+1
= M+1
eβq σq
M+1 M+1
X βq ,
M+1
∂E
S−1
∂ X βq
= ξ M+1
βq ,
∂T pq
M ∂T pq
M
β=0
n M −1 M M
l=0 Tlq Yβl − Oq
M+1
X βq =
M+1
. (16.25)
gqM+1
292 16 Matrix Feed Forward Networks
Since the offset and gain terms are independent of the choice of the input set index
β, the subscripts β are unnecessary on those terms and are not shown. Taking the
indicated partial, we have
n M −1 ∂TlqM
YβlM
∂ X βq
M+1 l=0 ∂T pq
M
= ,
∂T pq
M
gqM+1
YβMp
= . (16.26)
gqM+1
The gain and offset terms can also be updated. The appropriate equations follow
below.
S−1
∂E ∂E ∂ X βq
M+1
=
∂ OqM+1 β=0
∂ X βq
M+1
∂ OqM+1
S−1
β=0 ξβq
M+1
=−
gqM+1
S−1
∂E ∂E ∂ X βq
M+1
=
∂gqM+1 β=0
∂ X βq
M+1
∂gqM+1
S−1
β=0 ξβq (yβq − OqM+1 )
M+1 M+1
=− (16.28)
(gqM+1 )2
The back propagation equations for the last hidden layer to the output layer are thus
given by Eqs. (16.27) and (16.28).
We now derive the back propagation equations for the remaining layers. Consider
S−1
M
∂E ∂E ∂ X βq
= (16.29)
∂T pq
M−1
β=0
∂ X βq
M
∂T pq
M−1
16.3 Partial Calculations for the MFFN 293
∂E
Defining ξβq
M
= M ,
∂ X βq
we obtain
n M+1 −1
S−1
∂YαM+1
j
ξβq
M
= eαM+1
j
α=0 j=0
∂ X βq
M
n M+1 −1
S−1
∂ X αM+1
j
= eαM+1
j σ M+1
j X M+1
αj (16.30)
α=0 j=0
∂ X βq
M
we then calculate
n M −1 ∂Y M
Tl M ( ∂ XαlM )
∂ X αM+1
j
l=0 j βq
= ,
∂ X βq
M
g M+1
j
n M −1 ∂ X αl
M
l=0 Tl M
j σl
M
(X αl ) ( ∂ X M )
M
βq
= ,
g M+1
j
n M −1
l=0 Tl M
j σlM (X αl
M
) δβα δql
= ,
g M+1
j
TqMj σqM (X βq
M
) δαβ
= , (16.31)
g M+1
j
Hence,
−1
S−1 n M+1
∂ X αM+1
j
ξβq
M
= eαM+1
j σ M+1
j X M+1
αj
α=0 j=0
∂ X βq
M
S−1 n M+1−1 Tq j σqM (X βq
M M
) δαβ
= M+1
eα j σjM+1 M+1
Xα j
α=0 j=0
g M+1
j
n M+1 −1
TqMj σqM (X βq M
)
= M+1
eβ j σj M+1 M+1
Xβ j (16.32)
j=0
g M+1
j
294 16 Matrix Feed Forward Networks
similar:
∂ yβq
M
∂ X βq
M ∂T pq
M−1
= ,
∂T pq
M−1 gqM
n M−1 −1 ∂TlqM−1
l=0 ∂T pq
M−1 YβlM−1
= ,
gqM
YβM−1
p
= . (16.34)
gqM
n M+1 −1
ξβM+1 M
j Tq j
ξβq
M
= (σqM ) (X βq
M
)
j=0
g M+1
j
S−1
∂E β=0 ξβq
M M−1
Yβ p
= (16.35)
∂T pq
M−1 gqM
The update equations for the gain and offset terms are given below.
S−1
M
∂E ∂E ∂ X βq
=
∂ OqM ∂ X βq
M ∂ OqM
β=0
S−1 M
β=0 ξβq
=−
gqM
S−1
M
∂E ∂E ∂ X βq
=
∂gqM ∂ X βq
M ∂gqM
β=0
S−1 M M
β=0 ξβq (yβq − Oq )
M
=− (16.36)
(gqM )2
16.4 The Full Backpropagation Equations for the MFFN 295
The update equations for the next back propagation step are thus given by Eqs. (16.35)
and (16.36). Equations (16.27), (16.28), (16.35) and (16.36) give the back propaga-
tion equations for the last two layers. The back propagation equations between any
two interior layers, including the equations governing the back propagation between
the first hidden layer and the input layer, can now be derived using induction. We
obtain for all layer indices l, 0 ≤ l ≤ M:
ξβq
M+1
= M+1
eβq σq
M+1 M+1
X βq ,
n M−l+1 −1
ξβM−l+1
j TqM−l
j
ξβq
M−l
= σq
M−1
(X βq
M−l
),
j=0
g M−l+1
j
S−1
∂E β=0 ξβq
M−l+1 M−l
Yβ p
= ,
∂T pq
M−l
gqM−l+1
S−1
β=0 ξβq
M−l+1
∂E
=− ,
∂ OqM−l+1 gqM−l+1
S−1 M−l+1 M−l+1
∂E β=0 ξβq (yβq − OqM−l+1 )
= − . (16.37)
∂gqM−l+1 (gqM−l+1 )2
2 −1
n
ξβ2 j Tq1j
ξβq
1
= σq1 (X βq
1
),
j=0
g 2j
S−1
∂E β=0 ξβq
1
Yβ0p
= ,
∂T pq
0 gq1
S−1
β=0 ξβq
1
∂E
=−
∂ Oq1 gq1
S−1 1 1
∂E β=0 ξβq (yβq − Oq1 )
= − . (16.38)
∂gq1 (gq1 )2
296 16 Matrix Feed Forward Networks
There are no connection coefficients before T 0 and so to derive the update equations
for the gains and offsets for the input layer, we need to step back a bit. From our
derivations, we know
∂E
S−1
∂ X βq
0
= ξ 0
βq
∂ Oq0 ∂ Oq0
β=0
S−1 0
β=0 ξβq
=− (16.39)
gg0
∂E
S−1
∂ X βq
0
= ξ 0
βq
∂gq0 ∂gq0
β=0
S−1 0
β=0 ξβq (yq − Oq )
0 0
=− . (16.40)
(gg0 )2
Hence, if we want to have parameters in the input layer tuned, we can do so with
Eqs. (16.39) and (16.40). Equations (16.37), (16.39) and (16.40) gives the complete
set of back propagation equations for this type of neural network architecture. If
the parameters of the MFFN’s input layer are not mutable, we only need to use
Eq. (16.37), of course.
Let’s go through all of the forbidding equations above for a three layer network
which uses the same sigmoidal transfer function for all neurons in the hidden and
output layer and uses some other common transfer function for the input layer. We
assume all the transfer functions are of type σ. In the notation we used before, we
have M = 1 so that the output layer is M + 1 = 2 and the input layer is M − 1 = 0.
Let’s also assume we have n 0 neurons in the input layer, n 1 neurons in the hidden
layer and n 2 neurons in the output layer. Our equations become:
where xβq is the input into the qth input neuron for sample β.
298 16 Matrix Feed Forward Networks
Let’s get started at coding a MatLab implementation of the MFFN. We will not be
clever yet and instead code up a quick and dirty example for a specific 2-3-1 MFFN.
Let’s begin with some initializations and the code for calculating the summed squared
error over the data set. We will assume each piece of data is sent through the 2-3-1
MFFN in a function myffneval which will be written later. Using the evaluation
function and the initializations, we can then calculate the summed squared error over
the data. Consider the following code for the 2-3-1 initialization part. The function
MyFFN assumes there is some data stored in Data which consists of triples of
the form (x, y, z). The built in function rand is very easy to use. The command
rand(2,3) returns random numbers in (0, 1) for all the entries of a 2 × 3 matrix
and so the line -1+2*rand(2,3) sets up a random matrix with entries in (−1, 1).
We use this approach to set up a random matrix for our T 0 and T 1 matrices which
will be called T1 and T2 in MatLab because the numbering in MatLab starts at 1
instead of the 0 we use in our derivations. We also set up random vectors of the
offsets and gains which we call O and G.
We also set up our nodal processing functions. We need 6 of them here, so we set
them up in a cell data structure nodefunction{i} as follows;
This makes it easy for us to refer to nodal processing functions by their node number.
Next, we set up the loop over the data triples which enables us to calculate the summed
squared error. We store the outputs of the nodes as Y, the raw errors as e = Y - Z
and the squared errors as e^2. We then use this information to sum the squared
errors over the data set and compute the error function sumE.
16.6 A MatLab Beginning 299
To finish we need to write the 2-3-1 evaluation code. This is given in myffneval.
This code follows our algorithm closely and you should look at it carefully to make
sure you see that.
300 16 Matrix Feed Forward Networks
end
We can test this on some simple data. There are two inputs x1 and x2 and if 2x1 /x2 > 1,
the target value is 1 and otherwise, it is zero. We setup some data and run our codes
as follows:
32 The s q u a r e d e r r o r s a r e
E
E =
0.7872 0.8951 0.9123 0.0008
% and t h e e r r o r i s
37 sumE
sumE =
1.2977
Let’s write our first attempt at implementing a standard MFFN in MatLab. We are
going to be quite general in this implementation; the previous 2-3-1 example was just
a warmup. From studying our code there, we can see more clearly how the places
where we should write more generic code. But make no mistake; there are always
tradeoffs to any software implementation decision we make.
We will break our implementation into fairly standard pieces: an initialization
function, an evaluation function, an update function and finally a training function.
16.7.1 Initialization
[ P ,m] = s i z e ( L a y e r S i z e s ) ;
21 % i n i t i a l i z e parameters
f o r p=1:P−1
rows = L a y e r S i z e s ( p ) ;
c o l s = L a y e r S i z e s ( p+1) ;
T{p} = TL+(TH−TL) ∗ rand ( rows , c o l s ) ;
26 O{p} = OL+(OH−OL) ∗ rand ( 1 , rows ) ;
G{p} = GL+(GH−GL) ∗ rand ( 1 , rows ) ;
end
rows = L a y e r S i z e s (P) ;
O{P} = OL+(OH−OL) ∗ rand ( 1 , rows ) ;
31 G{P} = GL+(GH−GL) ∗ rand ( 1 , rows ) ;
end
302 16 Matrix Feed Forward Networks
16.7.2 Evaluation
% E x t r a c t number o f l a y e r s
22 % so
% output layer = P
% input layer = 1
[ P ,m] = s i z e ( L a y e r S i z e s ) ;
27 % Assume I n p u t S i z e = T a r g e t S i z e ! !
%
% E x t r a c t t h e number o f t r a i n i n g s a m p l e s S
[ S , I n p u t S i z e ] = s i z e ( Input ) ;
32 % s e t up s t o r a g e f o r a l l MFFN v a l u e s
y = {};
Y = {};
f o r p = 1 :P
cols = LayerSizes (p) ;
37 y{p} = z e r o s ( S , c o l s ) ;
Y{p} = z e r o s ( S , c o l s ) ;
end
47 energy = 0 ;
f o r s =1:S
% l o o p on t r a i n i n g e x a m p l a r s
f o r p = 1 :P
i f p ==1
52 y{p } ( s , : ) = I n p u t ( s , : ) ;
f o r i =1: L a y e r S i z e s ( 1 )
i n p u t = y{p } ( s , i ) ;
o f f s e t = O f f s e t {p } ( i ) ;
g a i n = Gain{p } ( i ) ;
16.7 MatLab Implementations 303
57 Y{p } ( s , i ) = s i g m a i n p u t ( i n p u t , o f f s e t , g a i n ) ;
end
else
f o r i =1: L a y e r S i z e s ( p )
y{p } ( s , i ) = d o t (T{p − 1 } ( : , i ) ,Y{p −1}( s , : ) ) ;
62 i n p u t = y{p } ( s , i ) ;
o f f s e t = O f f s e t {p } ( i ) ;
g a i n = Gain{p } ( i ) ;
i f p ==P
Y{p } ( s , i ) = s i g m a o u t p u t ( i n p u t , o f f s e t , g a i n ) ;
67 else
Y{p } ( s , i ) = s ig ma ( i n p u t , o f f s e t , g a i n ) ;
end
end
end
72 end % p l o o p
Output ( s , : ) = Y{P} ( s , : ) ;
RE( s , : ) = Output ( s , : ) − T a r g e t ( s , : ) ;
LE ( s , : ) = RE( s , : ) . ˆ 2 ;
e n e r g y = e n e r g y + sum ( LE ( s , : ) ) ;
77 end % s l o o p
E = 0.5∗ energy ;
end
16.7.3 Updating
Now let’s consider the update process. We will follow the update algorithms we have
presented. Our node evaluation functions have the form of the σ listed below for
various values of L and H where we assume L < H , of course.
1 y−o
σ(y, o, g) = (H + L) + (H − L) tanh
2 g
2
∂σ H−L y−o
=− sech
∂o 2g 2 g
1 ∂σ
=− .
g ∂y
In the code for our updates, we will assume we have coded all of our σ functions as
if the argument to the node function was simply u instead of the ratio y−og
. Hence,
we need the following derivative for all our node functions.
H−L
σ (u) = l(sech(u))2 .
2
In code we would have
% main s i g m a
sig ma = n o d e f u n c t i o n { 1 } ;
% i n p u t sigma
sigmainput = nodefunction {2};
5 % o u t p u t sigma
sigmaoutput = nodefunction {3};
% main s i g m a p r i m e
sigmaprime = nodefunctionprime {1};
% i n p u t sigma prime
10 sigmainputprime = nodefunctionprime {2};
% o u t p u t sigma prime
sigmaoutputprime = nodefunctionprime {3};
33
% E x t r a c t number o f l a y e r s
% so
% output layer = P
% input layer = 1
38 [ P ,m] = s i z e ( L a y e r S i z e s ) ;
%
% E x t r a c t t h e number o f t r a i n i n g samples S
[ S , I n p u t S i z e ] = s i z e ( Input ) ;
43
% setup storage for generalized errors
xi = {};
f o r p=P: − 1 : 1
cols = LayerSizes (p) ;
48 x i {p} = z e r o s ( S , c o l s ) ;
end
88 end % o u t e r q node l o o p
% f i n d DT{p } , DO{p +1} , DG{p+1}
f o r u = 1 : rows
for v = 1: cols
DT{p } ( u , v ) = DT{p } ( u , v ) + x i {p+1}( s , v ) ∗Y{p } ( s , u ) / Gain{p
+1}(v ) ;
93 end%v
end%u
f o r v =1: c o l s
DO{p+1}(v ) = DO{p+1}(v ) − x i {p+1}( s , v ) / Gain{p+1}(v ) ;
%DG{p +1}( v ) = DG{p +1}( v ) − x i {p +1}( s , v ) ∗( y {p +1}( s , v ) −
O f f s e t {p +1}( v ) ) / ( Gain{p +1}( v ) ˆ 2 ) ;
98 end% v l o o p
end % s l o o p
end % l a y e r l o o p
% now do i n p u t l a y e r
for s = 1:S
103 % f i n d DO{1} , DG{1}
cols = LayerSizes (1) ;
for v = 1: cols
DO{1}( v ) = DO{1}( v ) − x i {1}( s , v ) / Gain {1}( v ) ;
%DG{1}( v ) = DG{1}( v ) − x i {1}( s , v ) ∗( y {1}( s , v ) − O f f s e t {1}( v ) )
/ ( Gain {1}( v ) ˆ 2 ) ;
108 end % v l o o p
end % s l o o p
% f i n d t h e norm o f t h e g r a d i e n t
squarelength = 0 . 0 ;
f o r p = 1 : P−1
113 s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DT{p } , ) ) ˆ2;
i f DOGAIN == 1
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DG{p } , ) ) ˆ2;
end
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DO{p } , ) ) ˆ2;
118 end
i f DOGAIN == 1
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DG{P} , ) ) ˆ2 + ( norm (DO{P} ,
) ) ˆ2;
else
s q u a r e l e n g t h = s q u a r e l e n g t h + ( norm (DO{P} , ) ) ˆ2;
123 end
length = sqrt ( squarelength ) ;
Normgrad = l e n g t h ;
NormGradMFFN = Normgrad
lengthtol = 0.05;
128 i f length < lengthtol
s c a l e = lambda ;
else
s c a l e = lambda / l e n g t h ;
end
133 fscale = scale ;
% now do t r a i n i n g
f o r p = 1 : P−1
rows = L a y e r S i z e s ( p ) ;
c o l s = L a y e r S i z e s ( p+1) ;
308 16 Matrix Feed Forward Networks
138 f o r u = 1 : rows
for v = 1: cols
T{p } ( u , v ) = T{p } ( u , v ) − s c a l e ∗DT{p } ( u , v ) ;
end% v l o o p
end% u l o o p
143 f o r u = 1 : rows
O f f s e t {p } ( u ) = O f f s e t {p } ( u ) − s c a l e ∗DO{p } ( u ) ;
%Gain{p }( u ) = Gain{p }( u ) − s c a l e ∗DG{p }( u ) ;
end%u l o o p
end% p l o o p
148 rows = L a y e r S i z e s (P) ;
f o r u = 1 : rows
O f f s e t {P} ( u ) = O f f s e t {P} ( u ) − s c a l e ∗DO{P} ( u ) ;
%Gain{P}( u ) = Gain{P}( u ) − s c a l e ∗DG{P}( u ) ;
end
153
end
16.7.4 Training
Now we can test the code. We use 31 exemplars to train a 1-5-1 MFFN with the gain
updates turned off to approximate a simple step function. So we have 10 + 7 = 17
tunable parameters and 31 constraints. We have to setup inputs and outputs. Here we
want every input less than 1 to be the target 0 and every input bigger than 1 to be the
target 1. The traditional if conditional in Matlab is not suited for vectors such as we
create below for X. Hence we use the function iff below to do the testing. The builtin
Matlab function nargchk is used with condition set to be the test x < 1 and the
code line iff(x<1,1,0) is expanded into error(nargchk(x<1,1,0,31))
as the size of X is 31. When the test is true, the return value is 1 and otherwise it is
0. This enables us to use conditional testing on a vector.
27 n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t
) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
32 %
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
NumIters = 1 0 ;
lambda = . 0 0 5 ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
37 G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
NumIters = 1 0 ;
lambda = . 0 0 5 ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
42 ...
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, lambda , 2 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 0 ) ;
47 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
nodefunctionprime , . . .
G, O, T , L a y e r S i z e s , y , Y, 0 . 0 0 6 , 1 0 0 0 ) ;
After all this training, we still haven’t quite got it. What we do have is an approxi-
mation to this step function shown in Fig. 16.1. To generate this graph, we evaluate
the FFN at new points and plot along with the graph of the real function.
We are going to approximate sin2 (x) on the interval [0, π]. We use 21 examplars
with a 1-5-1 MFFN which is 17 parameters that can be adjusted and 21 constraints.
13 nodefunction = {};
nodefunctionprime = {};
sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
n o d e f u n c t i o n {1} = s i g m a ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
18 n o d e f u n c t i o n {2} = s i g m a i n p u t ;
SL = − . 5 ;
SH = 1 . 5 ;
% o u p u t n o d e s r a n g e from SL t o SH
s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
23 n o d e f u n c t i o n {3} = s i g m a o u t p u t ;
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
28 n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t
) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
%
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
33 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, 0 . 0 0 5 , 1 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
...
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 5 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 5 0 ) ;
38 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 5 0 ) ;
After all this training, we are doing pretty well. The approximation is shown in
Fig. 16.2 and it is clearly phase shifted a bit.
The choice of linear output node functions does not do as well. Even after significant
training, the approximation is not as good. The approximation is shown in Fig. 16.3.
Listing 16.16: Approximating sin2 Again: Linear Outputs
X = linspace (0 ,3.14 ,31) ;
Input = X’ ;
U = s i n (X) . ˆ 2 ;
4 Target = U’ ;
LayerSizes = [ 1 ; 4 ; 1 ] ;
OL = − 1 . 4 ;
OH = 1 . 4 ;
GL = 0 . 8 ;
9 GH = 1 . 2 ;
TL = − 2 . 1 ;
TH = 2 . 1 ;
[ G, O, T ] = m f f n i n i t (GL,GH, OL ,OH, TL , TH, L a y e r S i z e s ) ;
nodefunction = {};
16.8 Sample Training Sessions 313
Fig. 16.3 Approximating a sin2 using a 1-5-1 MFFN with linear outputs
14 nodefunctionprime = {};
sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
n o d e f u n c t i o n {1} = s i g m a ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
19 s i g m a o u t p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
24 s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
%
29 [ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
NumIters = 1 0 ;
lambda = . 0 0 0 5 ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , NumIters ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 ) ;
34 [ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 4 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 4 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 1 0 0 0 ) ;
[ G, O, T , Energy ] = m f f n t r a i n ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, lambda , 2 0 0 0 ) ;
39 .
more i t e r a t i o n s
.
X = linspace (0 ,3.14 ,101) ;
Input = X’ ;
314 16 Matrix Feed Forward Networks
44 U = s i n (X) . ˆ 2 ;
Target = U’ ;
[ y , Y, RE, E ] = m f f n e v a l 2 ( Input , Target , n o d e f u n c t i o n , G, O, T , L a y e r S i z e s ) ;
Z = Y{ 3 } ;
p l o t (X, U, X, Z ) ;
Reference
We now introduce a version of the feed forward architecture known as the chained
feedforward network or CFFN and discuss the backpropagation method of training in
this context. Here, we will implement the FFN as a chain of computational elements
which will provide a very general and flexible backbone for future discussion and
expansion. The CFFN is quite well-known and our first exposure to it occurred in
the classic papers of Werbos (1987a, b, 1988, 1990a, b). We will apply the methods
derived in Sect. 17.3 below to calculate the needed partial derivatives.
17.1 Introduction
which is usually called the gain of the transfer function. The function φ(g) is for con-
venience only; it is awkward to allow the denominator of the transfer function model
to be zero and to change sign. A typical function we use to control the range of the
gain parameter is φ(g) = gm + g M −g
2
m
1.0 + tanh(g) where gm and g M denote the
lower and upper saturation values of the gain parameter’s range. The CFFN model
consists of a string of N neurons, labeled from 0 to N − 1. Some of these neurons
can accept external input and some have their outputs compared to external targets.
We let
We will let n I and n O denote the cardinality of U and V respectively. The remaining
neurons in the chain which have no external role will be called hidden neurons with
dimension n H . Note that n H + | U | = N . Note that it is possible for an input neuron
to be an output neuron; hence U and V need not be disjoint sets. The chain is thus
divided by function into three possibly overlapping types of processing elements: n I
input neurons, n O output neurons and n H internal or hidden neurons. In Fig. 14.12a,
we see a prototypical chain of eleven neurons. For clarity only a few synaptic links
from pre to post neurons are shown. We see three input neurons (neurons 0, 1 and 4)
and four output neurons (neurons 3, 7, 9 and 10). Note input neuron 0 feeds its
output forward to input neuron 1 in addition to feeding forward to other postsynaptic
neurons. The set of postsysnaptic neurons for neuron 0 can be denoted by the symbol
F(0) which here is the set
F(0) = {1, 2, 4}
Similarly, we see
F(4) = {5, 6, 8, 9}
We will let the set of postsynaptic neurons for neuron i be denoted by F(i), the
set of forward links for neuron i. Note also that each neuron can be viewed as a
postsynaptic neuron with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by
B(i). In our example,
B(0) = {}
B(4) = {0}
where in general, the backward link sets will be much richer in connections than these
simple examples indicate. The weight of the synaptic link connecting the presynaptic
17.1 Introduction 317
neuron i to the postsynaptic neuron j (it is assumed j > i) will be denoted by Ti→ j .
The input of a typical postsynaptic neuron therefore requires summing over the
backward link set of the postsynaptic neuron in the following way:
y post = x + T pr e→ post Y pr e
pr e∈B( post)
where the term x is the external input term which is only used if the post neuron is
an input neuron. This is illustrated in Fig. 14.13. We will use the following notation
(some of which has been previously defined) to describe the various elements of the
chained FFN:
that is,
⎡ ⎤
Y v0
⎢ .. ⎥
H (x) = ⎣ . ⎦
Y vn O −1
318 17 Chained Feed Forward Architectures
N −1
ns = | F(i) |
i=0
no = N
ng = N
where | F(i) | denotes the size of the forward link set for the ith neuron; n s denotes
the number of synaptic links; n o , the number of offsets; and n g , the number of gains.
The number of parameters for a CFFN is then written as N p :
N p = ns + no + ng (17.4)
Now this notation is quite convoluted; however, we must be clear which component
of the target are to be matched with which CFFN output neuron. The mapping
d : V → {0, . . . , n O − 1} is, of course, the trivial correspondence d(vi ) = i. For
notational cleanness, we will simply denote d(vi ) by di . The CFFN training Problem
is then to choose the N p chain FFN parameters such that we minimize an energy
function, E, given by
O −1
S−1 n
E = 0.5 f Yαvi − Dαi ,
α=0 i=0
S−1
= 0.5 f Yαi − Dαdi (17.5)
α=0 i∈V
17.1 Introduction 319
where f is a nonnegative function of the error term Yαi − Dαdi ; e.g. using the function
f (x) = x 2 gives the standard L 2 or least squares energy function.
We will use the notation established in Sect. 17.1. If all the partial derivatives of
the energy function (17.5) with respect to the chain FFN parameters were known,
we could minimize (17.5) using a standard gradient descent scheme which we call
CFFN back propagation. Relabeling the N p CFFN variables temporarily as wi , 0 ≤
i ≤ N p − 1, the optimization problem we face is
min E (I, D, w)
(17.6)
w ∈ R Np
where we now explicitly indicate that the value of E depends on I, D and the
parameters w. In component form, the equations for gradient descent are then
∂ E
winew = wiold − λ (17.7)
∂wi wold
where λ is a scalar parameter that determines what percentage of the gradient step
to actually take. We can write (17.7) more compactly using vector notation:
In any sort of a model that is built from tunable parameter with an associated energy
function minimized by choosing the parameters via gradient descent, we must be
able to calculate the needed partial derivatives. We now discuss a recursive method
for the computation of the partial derivatives general computational chains.
320 17 Chained Feed Forward Architectures
E = E(Y 0 , Y 1 , Y 2 )
Y 2 = f 2 (Y 0 , Y 1 )
Y 1 = f 1 (Y 0 )
where f i denotes the functional dependencies. Note that these computational ele-
ments are organized in a chain like fashion. Now, consider a particular example of
the chained functions Y 2 and Y 1 given below:
Y 2 = (3Y 0 + 4Y 0 Y 1 )2
Y 1 = 2(Y 0 )3
The function Y 2 depends on Y 0 in two different ways: the first is a direct dependence
which we can denote by the usual partial derivative symbols:
∂Y 2
= 2(3Y 0 + 4Y 0 Y 1 ) × (3 + 4Y 1 )
∂Y 0
dY 2 ∂Y 2 ∂Y 2 ∂Y 1
= +
dY 0 ∂Y 0 ∂Y 1 ∂Y 0
2
where we denoted this total dependence using the notation dY dY 0
which is the total
derivative of Y 2 with respect to Y 0 . The situation can get more complicated. If
the function Y 1 had additional hidden dependencies on the variable Y 0 , the direct
∂Y 1
dependency term ∂Y 0 would be inadequate. In this case, it should be replaced by the
dY 1 2
total dependency term dY 0 . Thus, to compute the total derivative of Y with respect
0
to Y , we should use:
dY 2 ∂Y 2 ∂Y 2 dY 1
= +
dY 0 ∂Y 0 ∂Y 1 dY 0
= 2(3Y + 4Y 0 Y 1 ) × (3 + 4Y 1 ) + 2(3Y 0 + 4Y 0 Y 1 ) × (4Y 0 ) × 6(Y 0 )2
0
17.3 Partial Derivative Calculation in Generalized Chains 321
We therefore must make a distinction between the total derivative of a function with
respect to a variable u and the direct derivative of function with respect to u. We
will denote total derivatives with the d and direct derivatives with the @ notation,
respectively. Of course, in some cases these derivatives are the same and so we can
use either notation interchangeably.
Let’s go back to the original example and add a function E which depends on Y 0 ,
Y and Y 2 . We let
1
E = (Y 0 )2 + (Y 1 )4 + (Y 2 )8
Y 2 = (3Y 0 + 4Y 0 Y 1 )2
Y 1 = 2(Y 0 )3
∂E
= 2Y 0
∂Y 0
dE ∂E ∂ E dY 1 ∂ E dY 2
= + +
dY 0 ∂Y 0 ∂Y 1 dY 0 ∂Y 2 dY 0
∂E ∂ E ∂Y 1
∂ E ∂Y 2 ∂Y 2 ∂Y 1
= + + +
∂Y 0 ∂Y 1 ∂Y 0 ∂Y 2 ∂Y 0 ∂Y 1 ∂Y 0
= 2Y 0 + 4(Y 1 )3 × 6(Y 0 )2
+ 8(Y 2 )7 × 2(3Y 0 + 4Y 0 Y 1 ) × (3 + 4Y 1 ) + (4Y 0 ) × 6(Y 0 )2 (17.9)
This certainly looks complicated! However, the chained nature of these computations
allows us to reorganize this is much easier format: consider the recursive scheme
below:
dE ∂E
= (17.10)
dY 2 ∂Y 2
dE ∂E d E ∂Y 2
= + (17.11)
dY 1 ∂Y 1 dY 2 ∂Y 1
dE ∂E d E ∂Y 1 d E ∂Y 2
= + + (17.12)
dY 0 ∂Y 0 dY ∂Y
1 0 dY 2 ∂Y 0
After calculation, this simplifies to:
dE
= 8(Y 2 )7
dY 2
dE dE
= 4(Y 1 )3 + × 2(3Y 0 + 4Y 0 Y 1 ) × (4Y 0 )
dY 1 dY 2
322 17 Chained Feed Forward Architectures
The recursive calculation matches the one that was obtained directly via (17.9). The
discussion above motivates a general recursive scheme for computing the rates of
change of a given function E with respect to chained variables. Assume the function
E depends on the variables Y 0 to Y N in the following way:
E = E(Y 0 , Y 1 , . . . .Y N )
Y N = f N (Y 0 , Y 1 , . . . , Y N −1 )
Y N −1 = f N −1 (Y 0 , Y 1 , . . . , Y N −2 )
..
.
Y = f i (Y 0 , Y 1 , . . . , Y i−1 )
i
..
.
Y 1 = f 1 (Y 0 )
dE ∂E
= (17.13)
dY N ∂Y N
dE ∂E d E ∂Y N
= + (17.14)
dY N −1 ∂Y N −1 dY N ∂Y N −1
dE ∂E d E ∂Y N d E ∂Y N −1
= + + . (17.15)
dY N −2 ∂Y N −2 dY N ∂Y N −2 dY N −1 ∂Y N −2
dE ∂E N
d E ∂Y j
= + . (17.16)
dY i ∂Y i j=i+1
dY j ∂Y i
dE ∂E d E ∂Y j
N
= + (17.17)
dY 0 ∂Y 0 j=1
dY j ∂Y 0
Note that this can easily be organized into the scheme given in Table 17.2. Also,
we use the notation ∂Y for the more correct form ∂∂Yf i . Now, let’s consider another
j j
∂Y i
complication: each function Y i has an internal structure of functional parameters
17.3 Partial Derivative Calculation in Generalized Chains 323
wi = {wi0 , . . . , wi,ni −1 } which are used in addition to the inputs from elements Y 0 to
Y i−1 . The symbol n i denotes the number of internal parameters in function Y i . This
internal structure is not based on a chained architecture. To be explicit, our previous
example can be written in this new context as follows:
Y 0 = f 0 (w00 , . . . , w0,n 0 −1 )
= f 0 (w0 )
Y 1 = f 1 (Y 0 , w10 , . . . , w1,n 1 −1 )
= f 1 (Y 0 , w1 )
Y 2 = f 2 (Y 0 , Y 1 , w20 , . . . , w2,n 2 −1 )
= f 2 (Y 0 , Y 1 , w2 )
E = E(Y 0 , Y 1 , Y 2 )
∂E
To calculate ∂wi j
, we can modify the previous recursive scheme for all appropriate
indices i:
dE ∂E
=
dY 2 ∂Y 2
dE d E ∂Y 2
=
dw2i dY 2 ∂w2i
dE ∂E d E ∂Y 2
= +
dY 1 ∂Y 1 dY 2 ∂Y 1
dE d E ∂Y 1
=
dw1i dY 1 ∂w1i
dE ∂E d E ∂Y 1 d E ∂Y 2
= + +
dY 0 ∂Y 0 dY 1 ∂Y 0 dY 2 ∂Y 0
dE d E ∂Y 0
=
dw2i dY 0 ∂w0i
We summarize the recursive partial calculations when internal parameters are present
in Table 17.3. We can use these recursive techniques to derive the backpropagation
algorithm for the case of the Chain Feed Forward Network, CFFN. Indeed, we can
also derive the backpropagation algorithm for many other architectures in a similar
way.
324 17 Chained Feed Forward Architectures
Y i = σ i (y i , oi , g i )
⎛ ⎞
= σi ⎝ T j→i Y j , oi , g i ⎠
j∈B(i)
where the internal parameters of the function Y i are the gain and offset of the
transfer function as well as the link weights connecting to neuron i. Hence, this
particular type of chained architecture can be written in functional form using the
internal parameters
wi = {T0→i , . . . , Ti−1→i , oi , g i }
This internal parameter set has cardinality i + 2. Thus, each transfer function has
the instantiated form
Y i = σ i (y i , oi , g i )
≡ f i (Y 0 , Y 1 , . . . , Y i−1 , wi )
Error Notation: The error between the value from output neuron i due to sample
α and the desired output component Dαdi will be denoted by the term eαi . For
notational purposes, we will define the generalized output error as follows:
Yαi − Dαdi , i ∈ V
eαi =
0, else
Energy Function: The energy function depends explicitly on the outputs of the n O
output neurons in the chain FFN. The partial with respect to the ith output neuron
∂E
will be denoted E i = ∂Y i whose value at a given sample α is given by
Yαi − Dαdi = eαi i ∈ V
E i,α =
0, else
∂E
Note that E i,α is equivalent to the direct partial term ∂Y i
.
Indicial Notation: Our indices derive from a C based notation. Hence, sums typi-
cally run from 0 to an upper bound of the form M − 1. To avoid abusive notation,
we will use the convention M ≡ M − 1 to indicate that the variable M has been
decremented by 1. We will also let θ denote n O . Note that we also have a nice
identity connecting the forward and backward link sets: for a given node i, we
have
N −1
= (17.18)
k=i+1 i∈B(k) k∈F (i)
This formula is interpreted as follows: since the value i is fixed, the inner sum over i ∈
B(k) is empty unless i is in the set B(k). We will now apply the notions about recur-
sively organized chained partial calculations to the CFFN. Since we have seen how to
interpret the CFFN as a general chained architecture using the notation of Sect. 17.3,
we can apply the recursive partial calculations for chains with internal parameters
directly to obtain (see Table 17.3) the backpropagation equations for the CFFN.
326 17 Chained Feed Forward Architectures
∂Y j
17.4.1 The ∂Y i
Calculation
∂Y j
We begin by finding the critical partial ∂Y i
so that we can find the total dependence
of E on Y i .
⎧ ⎫
∂Y j
∂ ⎨ ⎬
j
= σ (y j
, o j
, g j
) x j
+ Tk→ j Y k
∂Y i 0
∂Y i ⎩ k∈B( j)
⎭
j
= σ0 (y j , o j , g j ) Tk→ j δik
k∈B( j)
j
= σ0 (y j , o j , g j ) Ti→ j
i∈B( j)
where we interpret x j as zero in j is not in the input set U. Hence, the general
expansion term
N −1
dE ∂E d E ∂Y j
= +
dY i ∂Y i
j=i+1
dY j ∂Y i
∂E dE j
= + σ (y j , o j , g j ) Ti→ j
∂Y i j∈F (i)
dY j 0
where we use the identity (17.18) to simplify the equations above. Of course, the
dE i
term dY i is not there unless the energy has a direct dependence on Y . This only
The partials with respect to the internal parameters consist of two separate types:
the first are partials with respect to the offset and gain variables and the second
are partials with respect to linking terms. The appropriate partial calculation from
Table 17.3 for a given internal parameter w is given by
dE d E ∂Y i
=
dw dY i ∂w
17.4 Partial Calculations for the CFFN 327
dE dE i dE dE i
= σ and i = σ
doi dY i 1 dg dY i 2
dE i i i i j
= σ (y , o , g )Y
dY i 0
These equations are summarized in the Table 17.4. For simplicity, in this table, we
assume there is only one output node—N − 1. Since we are really trying to compute
the gradient for the energy function E defined in terms of the full sample set, the
calculations outlined above must be done at each sample. This leads to the full
backpropagation equations given in Table 17.5. We observe that the backpropagation
algorithm to compute dYd Epr e at a given presynaptic neuron pr e requires summing over
β
the forward link set and a possible external error signal in the following way:
dE post dE
pr e = e + σ0 T pr e→ post post
dYβ post∈F ( pr e)
dYβ
where all variables subscripted with post are associated with a postsynaptic neuron.
An illustration of this process is shown in Fig. 17.1. This is a very convenient way
to abstract this process. Combining our evaluation procedure (Table 17.1) and the
for(j ∈ F(i))
dE j i
∂Ti→j = dY j σ0 Y
∂E
∂E
∂oi
= dE i
σ
dY i 1
∂E
∂g i
= dE i
σ
dY i 2
328 17 Chained Feed Forward Architectures
backpropagation procedure (Table 17.5), we see that information about both back-
ward (evaluation) and forward (partial derivative calculation) links is required. Also,
note that as formulated here, this algorithm is inherently sequential! It is not easily
implemented in parallel!
We will discuss how to implement the chain training algorithms in MatLab soon.
Let’s look at how to setup code to implement the CFFN ideas. We won’t do a proper
job yet as there are lots of details that are difficult to code properly. But we will start
the process now by looking at how to code the evaluation portion of the of a 2-3-1
CFFN which is the same network as our usual 2-3-1 MFFN we have discussed earlier.
We begin with the usual initializations we put into the function chaininit. This
function sets up a vector of six node numbers which we only need to know how many
nodes there are. You’ll note we don’t really use the individual entries on the variable
N here. We store the in node of an edge and the out node as well as the weight
value of the edge w in a data structure MatLab uses called a struct. Our struct is
called e and we can access the three fields in e using the notation e.in, e,out
and e.w. We set the edges for the 2-3-1 network architecture in the cell variable a
as a collection of row vectors of the form [in, out]. We use e to construct a cell
E to hold the edge information we need.
17.5 Simple MatLab Implementations 329
Now if you think about the evaluation step in a CFFN, recall we will have to
compute j∈B(i T j→i Y j . The backward set B(i) consists of a set of nodes. The
weight values T j→i are stored in the cell E. Suppose we had a CFFN with 100 nodes
and we were looking at the backward set B(36). The edges in the cell E are labeled
starting at 1 and with the last edge in the CFFN. Each node listed in B(36) corresponds
to an index j in a T j→i value we need. However, the value T j→i is stored as a value
w in some edge E{k}. We need to find the indices k where E{k}.in = j which
implies a search over the edges for each index j from the backward set. This is
expensive! It is a lot better to save us the search by storing edges corresponding to
the backward sets in a new cell BE. Exactly what is stored in each BE{k} depends
on how we choose to number our edges so be careful to keep track of your edge
numbering scheme as you will need to remember it here.
After we set up BE, we initialize the node functions as usual.
BE = { } ;
BE{1} = { } ;
BE{2} = { } ;
20 BE{3} = [ E{ 1 } ;E { 4 } ] ;
BE{4} = [ E{ 2 } ;E { 5 } ] ;
BE{5} = [ E{ 3 } ;E { 6 } ] ;
BE{6} = [ E{ 7 } ;E{ 8 } ;E { 9 } ] ;
25 s i g m a = @( x , o , g ) 0 . 5∗ ( 1 + tanh ( ( x −0) / g ) ) ;
s i g m a i n p u t = @( x , o , g ) ( x−o ) / g ;
n o d e f u n c t i o n {1} = s i g m a i n p u t ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
n o d e f u n c t i o n {3} = s i gm a ;
30 n o d e f u n c t i o n {4} = s i gm a ;
n o d e f u n c t i o n {5} = s i gm a ;
n o d e f u n c t i o n {6} = s i gm a ;
end
Next, we tackle the evaluation part with the function chaineval. Note we handle
the input nodes separately and the other nodes that implement the j∈B(i) T j→i Y j
calculation use the BE cell data. For node i, BE{i}(j) gives us an edge connecting
to i. Hence, we can extract the needed weight value easily, without a search, using
330 17 Chained Feed Forward Architectures
BE{i}(j).w. Note we never use the backward sets B here as the backward edge
sets are more useful.
To see this 2-3-1 CFFN in action, we make up some data and push it through the
architecture. We find
References
P. Werbos, Learning how the world works, in Proceedings of the 1987 IEEE International Confer-
ence on Systems, Man and Cybernetics, vol. 1 (IEEE, 1987a), pp. 302–310
P. Werbos, Building and understanding adaptive systems: A statistical/numerical approach to factory
automation and brain research. IEEE Trans. Syst. Man Cybern. 17, 7–19 (1987b)
P. Werbos, Generalization of backpropagation with application to a recurrent gas market model.
Neural Netw. 1, 339–356 (1988)
P. Werbos, A menu of designs for reinforcement learning over time, in Neural Networks for Control,
eds. by W. Miller, R. Sutton, P. Werbos (1990a), pp. 67–96
P. Werbos, Backpropagation through time: What it does and how to do it. Proc. IEEE 78(10),
1550–1560 (1990b)
Part VI
Graph Based Modeling In Matlab
Chapter 18
Graph Models
Our cognitive model will consist of many artificial neurons interacting on multiple
time scales. Although it is possible to do detailed modeling of neurobiological sys-
tems using GENESIS (Bower and Beeman 1998) and NEURON (Hines 2003), we
do not believe that they are useful tools in modeling the kind of information flow
between cortical modules that is needed for a cognitive model. Progress in build-
ing large scale models that involve many cooperating neurons will certainly involve
making suitable abstractions in the information processing that we see in the neuron.
Neurons transduce and integrate information on the dendritic side into wave form
pulses and there are many models involving filtering and transforms which attempt
to “see” into the action potential and find its informational core so to speak. How-
ever, all of these methods are hugely computationally expensive and even a simple
cognitive model will require ensembles of neurons acting together locally to create
global effects. We believe the simple biological feature vector (BFV) discussed here
is able to discriminate subtle changes in the action potential wave form.Hence, we
will use ensembles of abstract neurons whose outputs are BFVs (11 dimensional say)
to create local cortical computationally phase locked groups. The effects of modu-
latory agents such as neurotransmitters can then be modeled as introducing changes
in the BFVs. The kinds of changes one should use for a given neurotransmitters
modulatory effect can be estimated from the biophysical and toxin literature. An
increase in sodium ion flow, Ca++ gated second messenger activity can be handled
at a high level as a suitable change in one of the 11 parameters of the BFV. Note this
indeed is an implication for realistic neuronal processing. Further, this raises a very
interesting question. How much information processing is possible in a large scale
interacting neuron population given an abstract form of neuronal output? How can
we get insight into the minimal amount of information carried per output wave form
that can subserve cognition?
We believe this type of interaction is inherently asynchronous and hence the
artificial neurons we use for our model must be treated as agents interacting asyn-
chronously. The discussion in the previous chapters gives us powerful clues for the
© Springer Science+Business Media Singapore 2016 333
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_18
334 18 Graph Models
This output takes the form of the low dimensional BFV we discussed earlier. Hence,
the input Di is a sequence
Di = ζj j ∈ Di .
where each ζj is a BFV structure. Event sequences in our neural model are thus
sequences of the 11 dimensional outputs [t0 , V0 , t1 , V1 , t2 , V2 , t3 , V3 , g, t4 , V4 ] that
are processed by a given neuron agent.
These neurons need to interact with each other and form temporally and spatially
localized ensembles of activity whose outputs are entrained. For a given compu-
tational intelligence model, there is a global system time clock as well. Thus, the
neurons function in a timed environment and so all variables in all inputs and outputs
are also tagged with time indices. The representation at this level is quite messy and
is usually avoided. We can always do computations at each agent simply in terms of
information that has arrived and is to be processed. However, to be precise the output
of a neuron would be
ζi (t) = t0it , V0it , t1it , V1it , t2it , V2it , t3it , V3it , g it , t4it , V4it .
The FFP and OCOS cortical circuits can be combined into a multi-column model
(Raizada and Grossberg 2003) as seen in Fig. 18.1. This model can be implemented
using the artificial neurons introduced above. The generic software architecture for
our model is shown in Fig. 18.2. In Fig. 18.1, each cortical stack consists of 3 cortical
columns. If each cortical layer consists of 9 neurons, this allows for 3 OCOS/FFP
pathways inside each column. This is roughly 27 neurons per column. In Fig. 18.3,
a more complete architecture is shown. Here, for convenience, each cortical module
consists of 4 cortical stacks, all of which are interconnected. Hence, the 4 column
structure illustrated in Fig. 18.3 can be implemented with approximately 108 neurons
each. Thus, the five cortical modules (Frontal, Parietal, Occipital, Temporal and
Limbic) illustrated require a total of 540 artificial neurons for their implementation.
In general, since we know we start with isocortex, we can assume the number of
stacks is the same in each module. Further, we can allow more OCOS/FFP circuits
per layer. Letting NOCOS/FFP denote the number of OCOS/FFP circuits per layer,
we need 3NOCOS/FFP neurons per stack. Thus, if we use NS cortical stacks for each
cortical module, we will use 3NOCOS/FFP NS neurons for each of the five cortical
modules (Frontal, Parietal, Occipital, Temporal and Limbic). This leads to a total of
15NOCOS/FFP NS for the simulation of the cortical model.
18 Graph Models 335
cognitive function have probably emerged in these animals despite the small number
of neurons in their brains. Estimates of the neuron population of portia fimbriata are
in the 400,000 range. Portia does not have much long term memory so our choice to
ignore long term memory needs seems reasonable for a first model of cognition. Since
our artificial neurons use specific data structures to store local agent information, we
do use short term memory which is probably similar to the use of short term memory
in the spider, honeybee and praying mantis for route planning and decision choices.
Using our simple model of monoamine and RF modulation, 400,000 neurons in
our model would imply 400,000/15 = 26,667 neurons for the cortex models. Thus
we have the constraint NOCOS/FFP NS = 26,667. If we use 200 OCOS/FFP circuits per
layer, this implies 133 cortical stacks per cortical module. Even a cursory examination
of the cortex of an animal shows that such choice will mimic real cortical structure
nicely. Hence, a model at this level of cortical complexity has a reasonable probability
of subserving cognitive function such as decision making. One way to implement this
software architecture is to use non-blocking system calls that handle all connections
between agents in one thread. A function is called whenever a connection between
agents is ready for processing. Hence, this type of programming is known as event-
drive or callback based in addition to being called asynchronous. Each of our artificial
neurons will function in a networked environment and must accept inputs from other
agents and generate outputs or decisions. A computational element can be forced to
wait and therefore be blocked if it is waiting on the result of data from other elements
which it must have to complete its own calculations. Since a neurons accepts inputs
from other neurons, it may sometimes perform a computation that takes enough
time to keep other neurons from processing. We will choose not to use this type of
architecture and instead implement as much code as possible in a language such as
Erlang in which all objects are immutable and hence the blocking of calls and the
sharing of memory is not an issue.
In order to build even simple brain models, we thus need to write code that
implements graphs that connect our processing nodes. We will start by building
simple graph tools in MatLab/Octave using object oriented techniques as it is a
programming language readily available and easy to learn. This will enable us to
construct the underlying architecture of connections for our model and then we can
do a proof of concept for the edge and nodal processing in MatLab as well. Eventually,
we will discuss how to do this processing in other languages but that will be done in
detail in later volumes. We will now start on the objects of real interest to us: graphs.
We wish to build neural models and to do that we need a way to implement directed
graphs where the nodes are neurons and the edges are the synaptic connections
between neurons.
The implementation of the class oriented code for the an object is done in a special
directory which must begin with the @ symbol. Following the Octave documentation,
we would set up a subdirectory called @CoolObject to use to define our coolobject
338 18 Graph Models
class. The code that makes an object for us in the CoolObject class is called a
constructor. It has this general look:
The test nargin == 0 checks to see how many arguments are being used
to create our object. If there are no arguments, the line in Octave would be
p = CoolObject(); and we set the field element c of p to be a vector with
0 in it. Note that we are redefining how Octave interprets the statement p.c here
as normally the dot . has its own intrinsic meaning in Octave. In general, as usual
in object oriented programming, a class object, p can have multiple fields associ-
ated with it and we generally must overload the meaning of various built in Octave
commands to access these fields. We have already mentioned we must overload the
meaning of the dot "." and we generally also overload the meaning the evaluation
parenthesis so that the line p() is interpreted in a special way for our class object. If
there is one argument to our object, nargin == 1, we then use that argument to
fill in the field .cc. In this code, we also test to see that the argument we use is correct
for our object and leave an error if it is not. We also check to see that the number of
arguments does not exceed one. In this example, we assume we want our argument,
if present, to be either another CoolObject or a real valued vector. Note without
redefining the meaning of the dot . and the parentheses () for a CoolObject p,
the lines p.c and p(x) would not be allowed. To make these work we have to
overload the subsref function Octave uses to parse these statements. Once this is
done, we can then use the lines p.c and p(x) as we wish. To create a CoolObject
class, we set up a directory called @CoolObject. Note the name of the directory
and the name of class are almost the same. The directory however must start with an
@ symbol. Inside this directory, you find the code needed to build our objects.
18.1 Building Global Graph Objects One 339
The code that builds a CoolObject object is called the constructor and is in the file
CoolObject.m. The name of this file, CoolObject.m without the extension .m
must match the name of the class subfolder @CoolObject without the @. Finally,
the overloaded subsref function is in subsref.m. Our code listed here is not
functional and instead is an outline of what we have to do.
If we add the line s at the top of this function, we would see what the input s is for
the various cases. Using the dot, . gives us the following type and subs elements.
Hence the input s has type . and subs c. Then in the next one
340 18 Graph Models
For example, having added s at the top of subsref, we could run this example:
This is probably not very clear at this point, so let’s build a graph class in MatLab
(or Octave!). A graph model or graph object will consist of edge and vertices objects
glued together in the right way, so we have to build three classes to make this happen.
We need to do this because in order to build neural models we need a way to imple-
ment directed graphs where the nodes are neurons and the edges are the synaptic
connections between neurons.
Make a directory GraphsGlobal and inside it make the usual class directories
@edges, @vertices and @graphs. We call this directory GraphsGlobal
because here all of our nodes and edges will be globally labeled. This means as
we glue subgraphs together, their original node and edge numbering will be lost as
the new graph is built. Now we will eventually want to keep track of the subgraph
information which we will do using an addressing scheme. However, that is another
story. For now, we will stick to global numbering. In the directory, @vertices, we
set up a class to handle vertices.
We now write the constructor code and the code for the overloaded indexing. The
constructor is as follows:
18.1 Building Global Graph Objects One 341
We allow the vertices object to be constructed from a node list which is a vec-
tor (isvector(a)) or cell data (iscell(a)). The overloaded indexing is in
subsref.m.
We can then have an session like these. First, use a column vector to construct the
vertices object.
342 18 Graph Models
1 2 3 4 5
In the subdirectory @edges, we then write the code both the constructor and for
overloading the subsref command so we can access field elements of an edge
object. Here is the edge constructor code. It takes a vector of ordered pairs and sets
up a collection of edge structures with fields in and out.
1 3 3
2 5 7
We can also use row vectors in the constructor. Hence, this is fine also.
1 3 3
2 5 7
Now that we can construct a vertices and edges object, we can build a graph class.
344 18 Graph Models
Next, in the directory @graphs, we write the constructor and indexing code.
Notice the pair of statements we use as access via g.v.v is not possible. We do this
pair of steps:
This is necessary because the line out = g.v.v does not parse correctly. So we
do it in two steps. The same comment holds for the way we handle the edges. We
then build a graph in this session.
18.1 Building Global Graph Objects One 345
5 1 2 3 4 5
G2 . e
ans =
1 3 3
10 2 5 7
We will now add methods to each of our three classes. So far, we haven’t found a
way to force Octave to update its knowledge of the new code as we write it. Note,
we have to save our work in sessions and manually restart and reload code each time
we change the class codes. This is not a problem in MatLab, but you should be aware
of this if you use Octave.
We add code to add an edge to an existing edge object. This is in the file
@edge/add.m. Here is the code which specifically needs to have both the to and
from nodes as arguments.
The arguments (u,v) are the in node and out node values of the new edge. The
syntax for adding a method to a class is that the first argument of the function must
be the name we give the edge object in the edge constructor. We take the existing
E.e data and construct a new data container by concatenation. When we finish the
loop, we have copied all the old data over. We then concatenate the new edge data.
346 18 Graph Models
Then, we have the data object we can use to create a new edge object. We do that in
the line W = edges(out); and return W. The reason we use this design is that we
are not allowed to concatenate the incoming new data to the end of the old edge list.
Hence, we must do a complete rebuild. In a language that supports object oriented
programming ideas better, this would not have been necessary.
We add code to add a node to an existing vertices object. This is in the file
@vertices/add.m. Here is the code:
We take the existing V.v data and again construct a new data container by concate-
nation. When we finish the loop, we have copied all the old data over. We then con-
catenate the new vertex node to the data. At this point, we have the data object we can
use to create a new vertex object. We do that in the line W = vertices(out);
and return W. The same comments we made in the edge addition apply here. The
language MatLab or Octave forces us to do the global rebuild.
We now write code to add an edge, in the file addedge.m and add a node, in the
file addnode.m. First, look at the adding an edge code. This builds on the add code
in the vertices class.
We now add a node using the add node method for the vertices class.
18.2 Adding Class Methods First Pass 347
Once this code is built, we can add methods for calculating the incidence matrix for
the graph and its corresponding Laplacian. To find the incidence matrix, we use a
simple loop.
end
The Laplacian of the graph is then easy to find (recall we discussed the Laplacian
for a graph in Chap. 15!).
348 18 Graph Models
Here is a session where we build the OCOS and FFP networks and find their Lapla-
cians with eigenvalue structure. We add some annotation here and there. First, we
build an OCOS graph directly. This OCOS circuit includes the thalamus connection
which we will later remove as the thalamus will be implemented separately. We
define the node and edge vectors using integers for each node. We will label the
nodes in ascending order now so the resulting graph is feedforward.
Next, we construct the graph object and check to see if our overloaded subsref.m
code is working correctly.
We then explicitly add nodes and edges to build an FFP graph. Note for a small graph
this is not very expensive, but the way nodes and edges are added is actually very
inefficient as we have to rebuild the node and edge lists from scratch with each add.
16 −1 1 1 1 0 0 0 1 0 −1 0
0
0 −1 0 0 1 0 0 0 0 0 0
0
0 0 −1 0 0 1 0 0 0 0 0
0
0 0 0 −1 0 0 1 0 0 0 0
0
0 0 0 0 −1 0 0 0 0 0 0
0
21 0 0 0 0 0 −1 0 −1 −1 0 0
0
0 0 0 0 0 0 −1 0 0 0 0
0
0 0 0 0 0 0 0 0 0 0 0
1
0 0 0 0 0 0 0 0 0 0 1
−1
0 0 0 0 0 0 0 0 0 1 −1
0
Once we have these matrices, we can use standard tools within Octave to find their
corresponding eigenvalues and eigenvectors which we might want to do now and
then.
18.2 Adding Class Methods First Pass 351
Listing 18.32: Get OCOS and FFP Laplacian eigenvalue and eigenvectors
% g e t t h e e i g e n v a l u e and e i g e n v e c t o r s for
% t h e OCOS L a p l a c i a n
[ e i g v e c o c o s , e i g v a l o c o s ]= e i g ( L ) ;
% e x t r a c t t h e OCOS e i g e n v a l u e s
5 vals = diag ( eigvalocos )
vals =
−0.0000
0.3820
0.5607
10 2.0000
2.3389
2.6180
4.0000
6.1004
15 % g e t t h e e i g e n v a l u e and e i g e n v e c t o r s for
% t h e FFP L a p l a c i a n
[ e i g v e l a p , e i g v a l l a p ]= e i g ( L2 ) ;
% e x t r a c t t h e FFP e i g e n v a l u e s
vals2 = diag ( e i g v a l l a p )
20 vals2 =
−0.0000
0.2338
0.3820
0.5786
25 1.4984
2.0000
2.3897
2.6180
3.1986
30 4.0000
7.1010
where we have to remember that the nodes to and from need to be given using the
node numbering from the new larger graph we just constructed. That is hard to do so
later we will use a vector address strategy for building graphs from subgraphs, but
that is for a later time. Here is the annotated code.
352 18 Graph Models
% g e t s i z e s o f a l l n o d e s and e d g e s
[ n , n o d e s i z e g ] = s i z e ( ng n od e s ) ;
[ n , n o d e s i z e h ] = s i z e ( nhnodes ) ;
30 [ n , edgesizeg ] = s i z e ( egedges ) ;
[ n , edgesizeh ] = s i z e ( ehedges ) ;
% c r e a t e a copy of t h e nodes of H
% and add t h e number o f n o d e s o f g
35 % to each e ntry
for i = 1: nodesizeh
b ( i ) = nhnodes ( i ) + n o d e s i z e g ;
end
40 % c r e a t e t h e new a d j o i n e d node l i s t
v = [a, b];
% c r e a t e t h e new v e r t i c e s o b j e c t
V = vertices (v) ;
45 % copy t h e e d g e s of g i n t o c
% t h i s i s d a t a i n v e c t o r form
c = egedges ;
% add t h e number o f n o d e s o f g
% t o e a c h i n and o u t v a l u e i n t h e
50 % edges of H
d = ehedges + nodesizeg ;
% now c r e a t e t h e { [ i ; j ] , [ u ; v ] }
% t y p e d a t a s t r u c t u r e we n e e d f o r
% the edges constructor
55 % i n i t i a l i z e the data e
e = {egedges ( : , 1 ) };
% add more e n t r i e s u n t i l t h e e d g e s
% o f g a r e u s e d up
for i = 2: edgesizeg
60 e{ i } = c ( : , i ) ;
end
% now add t h e e d g e s o f H
% making s u r e we s e t t h e c o u n t e r
% t o s t a r t a f t e r t h e number o f e d g e s
65 % in g
f o r j =1: e d g e s i z e h
e { j+e d g e s i z e g } = d ( : , j ) ;
end
70 % now e i s i n t h e p r o p e r f o r m a t
% and we can c o n s t r u c t t h e new e d g e object
E = edges ( e ) ;
W = g r a p h s (V, E) ;
s i z e l i n k s = length ( l i n k s ) ;
75 f o r i =1: s i z e l i n k s
from = l i n k s { i } ( 1 , 1 ) ;
to = links { i }(2 ,1) ;
W = addedge (W, from , t o ) ;
end
18.2 Adding Class Methods First Pass 353
Notice how awkward this code is. The design here removes all the old node and
edge information from the incoming graph and renumbers and relabels everything
to fit into one new graph. It also assumes there is an edge used to glue the graphs
together. If we want to build a small brain model, we will want to assemble many
modules. Each module will have its own organization and we will want to remember
that. However, to use the Laplacian of the small brain model, we will need to have
an incidence matrix written in terms of global node numbers. Hence, we need a way
to organize nodes in terms of their module structure. This requires an address rather
than a node number. We will explore this in the next section where we build a new
graph class based on these new requirements.
We need a way to visualize our graphs. We will use visualization software package
called Graphviz (North and Ganser 2010) which includes the tool dot which draws
directed graphs. The documentation for dot tells us how to write a file which the dot
command can use to generate the drawing. Consider the file incToDot.m which
takes the incidence matrix of a graph, inc and uses it to write the needed file. This
354 18 Graph Models
file can have any name we want but it uses the extension .dot. Assume the file
name we generate is MyFile.dot. Once the file is generated, outside of MatLab
we then run the dot command to generate the .pdf file MyFile.pdf as follows:
We choose the pdf format as it is a vector graphic and hence can be scaled up without
loss of resolution as far as we want. This is very important in very large graphs. Here
is the code.
[ N, E ] = s i z e ( i n c ) ;
4 [ f i d , msg ] = f o p e n ( f i l e n a m e , ’w ’ ) ;
f p r i n t f ( f i d , ’ digraph G {\ n ’ ) ;
f p r i n t f ( f i d , ’ size =\"% f ,% f \";\ n ’ , width , h e i g h t ) ;
f p r i n t f ( f i d , ’ rankdir = TB ;\ n ’ ) ;
f p r i n t f ( f i d , ’ ratio =% f ;\ n ’ , a s p e c t ) ;
9 f o r i = 1 :N
f p r i n t f ( f i d , ’% d [ shape =\" rectangle \" fontsize =20 fontname =\" Times
new Roman \" label =\"% d \"];\ n ’ , i , i ) ;
end
f p r i n t f ( f i d , ’\ n ’ ) ;
for j = 1:E
14 a = 0;
b = 0;
f o r i = 1 :N
i f i n c ( i , j ) == 1
a = i;
19 e l s e i f i n c ( i , j ) == −1
b = i;
end
end
i f ( a ˜= 0 ) && ( b ˜= 0 )
24 f p r i n t f ( f i d , ’%d - >% d ;\ n ’ , a , b ) ;
end
end
f p r i n t f ( f i d , ’\ n } ’ ) ;
fclose ( fid ) ;
29 end
Let’s use this on the OCOS circuit. First, we generate the OCOS.dot file.
The dot file has the following look. Note the graph is organized simply. Each node is
identified with an integer and then in brackets there is a string which specifies details
about its shape and its label. Here we use the same label as the identifier. The edges
18.2 Adding Class Methods First Pass 355
are as simple as possible, specified with the string 4->1 and so forth. You’ll need to
look at the dot documentation to see more details about how to do this.
1−>2;
15 2−>3;
2−>4;
2−>5;
3−>6;
4−>7;
20 5−>8;
2−>7;
1−>7;
}
Then, in a separate terminal window, we generate the OCOS.pdf file which is shown
in Fig. 18.4.
At any node i in our graph, the nodes which interact with it via an edge contact are
in the backward set for node i, B(i). Hence, the input to node i is the sum
Ej→i Y (j)
j∈B(i)
where Ej→i is the value we assign the edge between j and i in our graph and Y (j) is
the output we assign to node j. We also know the nodes that node i sends its output
signal to are in its forward set F(i). Both of these sets are readily found by looking
at the incidence matrix for the graph. We can do this in the code BFsets which is a
new graph method placed in the @graphs directory. In it,we find the backward and
forward sets for each neuron in both the six dimensional address form and the global
node form. In a given row of the incidence matrix, the positive 1’s tells us the edges
corresponding to the forward links and the negative 1’s give us the backward edges.
So we look at each row of the incidence matrix and calculate the needed nodes for
the backward and forward sets.
% get edges
Eg = g . e ;
10 % get edge array
E = Eg . e ;
[ Kgrows , K g c o l s ] = s i z e (Kg) ;
15 BackGlobal = {};
ForwardGlobal = {};
BackEdgeGlobal = { } ;
f o r i = 1 : Kgrows
BackGlobal{ i } = [ ] ;
20 ForwardGlobal { i } = [ ] ;
BackEdgeGlobal { i } = [ ] ;
f o r j = 1 : Kgcols
d = E(: , j ) ;
u = d(1) ;
25 v = d(2) ;
i f Kg( i , j ) == 1
ForwardGlobal { i } = [ ForwardGlobal { i } , v ] ;
e l s e i f Kg( i , j ) == −1
BackGlobal{ i } = [ BackGlobal{ i } , u ] ;
30 BackEdgeGlobal { i } = [ BackEdgeGlobal { i } , j ] ;
end
end
end
35 end
18.2 Adding Class Methods First Pass 357
The simplest evaluation strategy then is to let each node have an output value deter-
mined by a simple sigmoid function such as given in the code sigmoid.m.
We use this sigmoid to do a simple graph evaluation. This is done in the code below.
We simply evaluate the sigmoid functions at each node using the current synaptic
interaction values.
A simple edge update function is then given by in the file below. This uses what is
called a Hebbian update to change the scalar weight associated with each edge. If
the value of the post neuron i is high and the value of the edge Ej→i is also high,
358 18 Graph Models
the value of this edge is increased by a multiplier such as 1.05 or 1.1. The code to
implement this is below.
% get size
11 s i z e V = l e n g t h (Y) ;
s i z e E = l e n g t h (W) ;
scale = 1.1;
f o r i = 1: sizeV
16 % g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
21 lenBEF = l e n g t h (BEF) ;
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
p r e = BF( j ) ;
post = i ;
26 hebb = Y( p o s t ) ∗W( l i n k ) ;
w e i g h t = W( l i n k ) ;
i f a b s ( hebb ) > . 3 6
w e i g h t = s c a l e ∗W( l i n k ) ;
end
31 Wts ( l i n k ) = w e i g h t ;
end
end
A simple example is shown below for the evaluation step for a small OCOS model.
First, we set up a node value vector Y the size of the nodes and a weight value vector
W the size of edges of the brain model. We fill the weight vector randomly with
numbers from −1 to 1. Then we compute the needed backward set information and
use it to perform an evaluation step. Then we do a synaptic weight Hebbian update.
At this point, we are just showing the basic ideas of this sort of computation. We
still have to restrict certain edges to have only negative values as they correspond to
inhibitory synaptic contacts. Also, we have not yet implemented the cable equation
update step afforded us with the graph’s Laplacian. That is to come. Here is an
example computation.
18.2 Adding Class Methods First Pass 359
We initialize all the node values to zero; hence, all of the input values to the nodes
will be zero as well and so the first evaluation will return a constant 0.5.
If the graph was a feed forward architecture, we could also implement a link update
strategy using the partial derivatives of an error function with respect to the mutable
parameters of the graph architecture. But that is another story!
18.3 Training
We are now ready to attempt to train our networks to match input to output data. We
begin with the code to train a feedforward architecture to do that. We have already
done this in Chap. 16 for the classical matrix-vector feedforward network architecture
using gradient descent. The chain feedforward network architecture is more flexible
and we have discussed it carefully in Chap. 17 and derived the new algorithms for
evaluation and gradient update. So now we will implement them. First, let’s redo the
evaluation code so we can handle node functions with gains and offsets. Also, we
want to be able to have different node functions for input and output nodes. We do
this in the new evaluation code here. We will do it a bit differently here. Now we
code the needed derivatives as follows. We need a lot of σ terms. So, for this type
of σ, we note
1 y−o
σ(y, o, g) = (H + L) + (H − L) tanh
2 g
2
∂σ H −L y−o
= sech
∂y 2g g
18.3 Training 361
and so
2
∂σ H −L y−o
=− sech
∂o 2g g
∂σ
=−
∂y
and
2
∂σ H −L y−o
=− sech
∂o 2g 2 g
1 ∂σ
=− .
g ∂y
∂σ
So in the following code, we will define the ∂y
functions.
Listing 18.51: Initializing the Node Functions for the Chain FFN
nodefunction = {};
nodefunctionprime = {};
%
4 sigma = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( 1 + tanh ( ( y − o f f s e t ) / g a i n ) ) ;
s i g m a i n p u t = @( y , o f f s e t , g a i n ) ( y − o f f s e t ) / g a i n ;
SL = − 1 . 2 ;
SH = 1 . 2 ;
% o u p u t n o d e s r a n g e from SL t o SH
9 s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
%
n o d e f u n c t i o n {1} = si gma ;
n o d e f u n c t i o n {2} = s i g m a i n p u t ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;
14 %
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n )
.ˆ2;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
%
19 n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
% get sizes
20 s i z e I N = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes . v ) ;
InVals = zeros (1 , sizeY ) ;
25
sigmainput = nodefunction {1};
sigmamain = n o d e f u n c t i o n { 2 } ;
sigmaoutput = nodefunction {3};
sigma = { } ;
30 % s e t a l l node f u n c t i o n s t o sigmamain
f o r i = 1: sizeY
sigma { i } = sigmamain ;
end
% r e s e t i n p u t node f u n c t i o n s t o s i g m a i n p u t
35 for j = 1: sizeIN
sigma {IN{ j }} = s i g m a i n p u t ;
end
% r e s e t o u t p u t node f u n c t i o n s t o s i g m a o u t p u t
f o r j = 1 : sizeOUT
40 sigma {OUT{ j }} = s i g m a o u t p u t ;
end
f o r i = 1: sizeY
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
50 % add i n p u t i f t h i s i s an i n p u t node
for j = 1: sizeIN
i f i == IN{ j }
sum = sum + I ( j ) ;
end
55 end
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
60 end
I n V a l s ( i ) = sum ;
Y( i ) = s igma { i } ( sum , O( i ) ,G( i ) ) ;
end
65 OutVals = Y;
end
We use the code similar to what we did in the matrix ffn work.
18.3 Training 363
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n )
.ˆ2;
14 n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
19 %
% c o n s t r u c t a 1−5−1 MFFN a s a c f f n
v = [1;2;3;4;5;6;7];
e = {[1;2] ,[1;3] ,[1;4] ,[1;5] ,[1;6] ,[2;7] ,[3;7] ,[4;7] ,[5;7] ,[6;7]};
E = edges ( e ) ;
24 V = vertices (v) ;
G = g r a p h s (V, E) ;
KG = i n c i d e n c e (G) ;
[ NodeSize , E d g e S i z e ] = s i z e (KG) ;
Y = z e r o s ( 1 , NodeSize ) ;
29 OL = − 1 . 4 ;
OH = 1 . 4 ;
GL = 0 . 8 ;
GH = 1 . 2 ;
O f f s e t = OL+(OH−OL) ∗ rand ( 1 , N o d e S i z e ) ;
34 Gain = GL+(GH−GL) ∗ rand ( 1 , N o d e S i z e ) ;
W = −1+2∗rand ( 1 , E d g e S i z e ) ;
[ B , F , BE ] = B F s e t s 0 (G,KG) ;
IN = { 1 } ;
OUT = { 7 } ;
39 I = 0.3;
[ y ,Y] = e v a l u a t i o n 2 ( n o d e f u n c t i o n , G, I , IN ,OUT,W, O f f s e t , Gain , B , BE) ;
Y =
Now to do gradient updates, we need to calculate the error of the chain ffn on the
input and training data. This is done as usual with the energy function.
364 18 Graph Models
32 sum = 0 ;
f o r i =1:S
[ y I ( i , : ) , YI ( i , : ) ] = e v a l u a t i o n 2 ( n o d e f u n c t i o n , g , I ( i , : ) , IN ,OUT,W, O, G,
B , BE) ;
error = 0;
f o r j = 1 : sizeOUT
37 OI ( i , j ) = YI ( i ,OUT{ j } ) ;
EI ( i , j ) = YI ( i ,OUT{ j } ) − D( i , j ) ;
e r r o r = e r r o r + ( EI ( i , j ) ) ˆ 2 ;
end
sum = sum + e r r o r ;
42 end
E = . 5 ∗ sum ;
end
Now we are ready to train. However, to do gradient updates, we need forward edge
information. So we need to update the BFsets0 code to return that information.
Here is the changed code which is now BFsets.
Listing 18.56: Returning forward edge information: Changed BFsets code
f u n c t i o n [ BackGlobal , ForwardGlobal , BackEdgeGlobal , ForwardEdgeGlobal ]
= B F s e t s ( g , Kg)
%
% g i s the graph
% Kg i s t h e i n c i d e n c e m a t r i x o f t h e g r a p h
5 % g . e i s the edge o b j e c t of the graph
%
% get edges
Eg = g . e ;
% get edge array
10 E = Eg . e ;
[ Kgrows , K g c o l s ] = s i z e (Kg) ;
BackGlobal = {};
15 ForwardGlobal = {};
BackEdgeGlobal = { } ;
ForwardEdgeGlobal = { } ;
f o r i = 1 : Kgrows
BackGlobal{ i } = [ ] ;
20 ForwardGlobal { i } = [ ] ;
BackEdgeGlobal { i } = [ ] ;
ForwardEdgeGlobal { i } = [ ] ;
f o r j = 1 : Kgcols
d = E(: , j ) ;
25 u = d(1) ;
v = d(2) ;
i f Kg( i , j ) == 1
ForwardGlobal { i } = [ ForwardGlobal { i } , v ] ;
ForwardEdgeGlobal { i } = [ ForwardEdgeGlobal { i } , j ] ;
30 e l s e i f Kg( i , j ) == −1
BackGlobal{ i } = [ BackGlobal{ i } , u ] ;
BackEdgeGlobal { i } = [ BackEdgeGlobal { i } , j ] ;
end
end
35 end
end
366 18 Graph Models
Now we are ready to look at the gradient update code. We think it is interesting to
be able to see how large the gradient norm is and to find at what component this
maximum occurs. So we wrote a utility function to do this. Given a vector, it takes
the absolute value of all the entries and finds the location of the maximum one. With
that, we can return this location in the variable LocMaxGrad. Here is the code for
the utility.
Now here is the update code. Note we start the ξ loop at the largest node in the OUT
set as the nodes after that are irrelevant to the gradient calculations.
20 % r i s t h e amount o f g r a d i e n t u p d a t e
%
% t u r n g a i n u p d a t e s on o r o f f : DOGAIN = 0
% t u r n s them o f f
DOGAIN = 0 ;
25
s i z e Y = l e n g t h (Y) ;
sizeW = l e n g t h (W) ;
s i z e I = length ( I ) ;
s i z e D = l e n g t h (D) ;
30 s i z e I N = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
xi = zeros ( s i z e I , sizeY ) ;
DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;
%f i n d maximum o u t p u t node
55 OMax = OUT{ 1 } ;
f o r i =2: sizeOUT
i f OUT{ i } > OMax
OMax = OUT{ i } ;
end
60 end
f o r alpha = 1: sizeI
for i = OMax : − 1 : 1
% get forward edge i n f o r m a t i o n f o r neuron i
65 FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h (FEF) ;
%f i n d o u t i f i i s a t a r g e t node
for j = 1 : sizeOUT
70 if OUT{ j } == i
k = j;
IsTargetNode = 1 ;
end
end
75 i f I s T a r g e t N o d e == 1
x i ( a l p h a , i ) = (Y( a l p h a , i ) − D( a l p h a , k ) ) ;
else
adder = 0 . 0 ;
368 18 Graph Models
80 l i n k = FEF( j ) ;
% F ( j ) i s a node w h i c h i g o e s f o r w a r d t o
p o s t = FF ( j ) ;
a d d e r = a d d e r + x i ( a l p h a , p o s t ) ∗W( l i n k ) ∗ s i g m a p r i m e { p o s t } ( y I (
a l p h a , p o s t ) ,O( p o s t ) ,G( p o s t ) ) ;
end
85 x i ( a l p h a , i ) = a d d e r ∗YI ( a l p h a , i ) ;
end % x i c a l c u l a t i o n
end % l o o p on n o d e s
end % l o o p on d a t a
90 f o r i = 1 : sizeY
% FE{ i } i s t h e g l o b a l node i n f o r m a t i o n
% where i t h e i t h w e i g h t
% FE{ i } = a s e t o f ( p r e , p o s t )
FF = F{ i } ;
95 FEF = FE{ i } ;
lenFF = l e n g t h ( FF ) ;
f o r j = 1 : lenFF
% get the weight index
l i n k = FEF( j ) ;
100 % W( l i n k ) = E { p r e −> p o s t } w h e r e
pre = i ;
p o s t = FF ( j ) ;
adder = 0 . 0 ;
for k = 1: sizeI
G( p o s t ) ) ∗YI ( k , p r e ) ;
end
DEDW( l i n k ) = a d d e r ;
end
end
110
f o r i = 1 : sizeY
adder = 0 . 0 ;
for k = 1: sizeI
a d d e r = a d d e r + x i ( k , i ) ∗ s i g m a p r i m e { i } ( y I ( k , i ) ,O( i ) ,G( i ) ) ;
115 end
DEDO( i ) = − a d d e r ;
adder = 0 . 0 ;
for k = 1: sizeI
a d d e r = a d d e r + x i ( k , i ) ∗ s i g m a p r i m e { i } ( y I ( k , i ) ,O( i ) ,G( i ) ) ∗ ( y I ( k , i )
−O( i ) ) ;
120 end
DEDG( i ) = − a d d e r /G( i ) ;
end
% f i n d norm
125 i f DOGAIN == 1
Grad = [DEDW DEDO DEDG ] ;
else
Grad = [DEDW DEDO ] ;
end
130 NormGrad = s q r t ( sum ( Grad . ∗ Grad ) ) ;
LocMaxGrad = getargmax ( Grad , . 0 0 1 ) ;
gradtol = 0.05;
i f NormGrad >= g r a d t o l
135 UnitGradW = DEDW/NormGrad ;
UnitGradO = DEDO/NormGrad ;
UnitGradG = DEDG/NormGrad ;
WUpdate = W− UnitGradW ∗ r ;
OUpdate = O − UnitGradO ∗ r ;
18.3 Training 369
end
6 SL = − 1 . 2 ;
SH = 1 . 2 ;
% o u p u t n o d e s r a n g e from SL t o SH
s i g m a o u t p u t = @( y , o f f s e t , g a i n ) 0 . 5 ∗ ( (SH+SL ) + (SH−SL ) ∗ tanh ( ( y −
o f f s e t ) / gain ) ) ;
n o d e f u n c t i o n {1} = s i g m a i n p u t ;
370 18 Graph Models
11 n o d e f u n c t i o n {2} = s i g m a ;
n o d e f u n c t i o n {3} = s i g m a o u t p u t ;
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y − o f f s e t ) / g a i n )
.ˆ2;
s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
16 s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a i n p u t p r i m e ;
n o d e f u n c t i o n p r i m e {2} = s i g m a p r i m e ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
% setup t r a i n i n g data
21 X = l i n s p a c e ( 0 , 3 , 3 1 ) ;
Input = X’ ;
U = c o s (X) ;
Target = U’ ;
% c o n s t r u c t a 1−5−1 MFFN a s a g r a p h
26 v = [ 1 ; 2 ; 3 ; 4 ; 5 ; 6 ; 7 ] ;
e = {[1;2] ,[1;3] ,[1;4] ,[1;5] ,[1;6] ,[2;7] ,[3;7] ,[4;7] ,[5;7] ,[6;7]};
E = edges ( e ) ;
V = vertices (v) ;
G = g r a p h s (V, E) ;
31 KG = i n c i d e n c e (G) ;
[ NodeSize , E d g e S i z e ] = s i z e (KG) ;
Y = z e r o s ( 1 , NodeSize ) ;
OL = − 1 . 4 ;
OH = 1 . 4 ;
36 GL = 0 . 8 ;
GH = 1 . 2 ;
O f f s e t = OL+(OH−OL) ∗ rand ( 1 , N o d e S i z e ) ;
Gain = GL+(GH−GL) ∗ rand ( 1 , N o d e S i z e ) ;
W = −1+2∗rand ( 1 , E d g e S i z e ) ;
41 [ B , F , BE, FE ] = B F s e t s (G,KG) ;
IN = { 1 } ;
OUT = { 7 } ;
[ E , yI , YI , OI , EI ] = e n e r g y ( n o d e f u n c t i o n , G, Input , IN ,OUT, Target , Y,W,
O f f s e t , Gain , B , BE) ;
E =
46 18.1698
% Start Training
lambda = . 0 0 0 5 ;
NumIters = 1 0 ;
[W, Gain , O f f s e t , Energy ] = c h a i n f f n t r a i n ( n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e
, G, Input , IN , Target ,OUT, Y,W, O f f s e t , Gain , B , BE, F , FE , lambda , NumIters ) ;
51 [W, Gain , O f f s e t , Energy ] = c h a i n f f n t r a i n ( n o d e f u n c t i o n , n o d e f u n c t i o n p r i m e
, G, Input , IN , Target ,OUT, Y,W, O f f s e t , Gain , B , BE, F , FE , . 0 1 , 2 0 0 ) ;
Energy ( 2 0 0 )
ans =
0.1263
This first chainffntrain code contains a built in plot of the energy shown in
Fig. 18.5.
18.4 Polishing the Training Code 371
Now that we have working code, let’s revisit it and see if we can make it a bit more
compact. We clearly need some sort of initialization method so we don’t have to
type so much. Let’s handle the initialization of all the node functions in one place.
Consider the code below which sets up all the sigma functions and then returns
them for use. We are experimenting here with allowing every single node function
to be unique. We return a cell for both sigma and sigmaprime.
t r a n s f e r f u n c t i o n p r i m e = @( y , o f f s e t , g a i n ) ( 1 / ( 2 ∗ g a i n ) ) ∗ s e c h ( ( y −
o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = t r a n s f e r f u n c t i o n p r i m e ;
t r a n s f e r f u n c t i o n i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1/ g a i n ;
372 18 Graph Models
20 n o d e f u n c t i o n p r i m e {2} = t r a n s f e r f u n c t i o n i n p u t p r i m e ;
t r a n s f e r f u n c t i o n o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( (SH−SL ) / ( 2 ∗ g a i n ) ) ∗
sech (( y − o f f s e t ) / gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = t r a n s f e r f u n c t i o n o u t p u t p r i m e ;
%
sigmamain = n o d e f u n c t i o n { 1 } ;
25 sigmainput = nodefunction {2};
sigmaoutput = nodefunction {3};
sigm a = { } ;
% get sizes
30 s i z e I N = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
% s e t a l l node f u n c t i o n s t o sigmamain
f o r i = 1 : NodeSize
35 sigm a { i } = sigmamain ;
end
% r e s e t i n p u t node f u n c t i o n s t o s i g m a i n p u t
for j = 1: sizeIN
sigm a {IN{ j }} = s i g m a i n p u t ;
40 end
% r e s e t o u t p u t node f u n c t i o n s t o s i g m a o u t p u t
f o r j = 1 : sizeOUT
sigm a {OUT{ j }} = s i g m a o u t p u t ;
end
45 %
sigmamainprime = n o d e f u n c t i o n p r i m e { 1 } ;
sigmainputprime = nodefunctionprime {2};
sigmaoutputprime = nodefunctionprime {3};
50 sigmaprime = {};
% s e t a l l node f u n c t i o n s t o sigmamain
f o r i = 1 : NodeSize
s i g m a p r i m e { i } = sigmamainprime ;
end
55 % r e s e t i n p u t node f u n c t i o n s t o s i g m a i n p u t
for j = 1: sizeIN
s i g m a p r i m e {IN{ j }} = s i g m a i n p u t p r i m e ;
end
% r e s e t o u t p u t node f u n c t i o n s t o s i g m a o u t p u t
60 f o r j = 1 : sizeOUT
s i g m a p r i m e {OUT{ j }} = s i g m a o u t p u t p r i m e ;
end
end
We might also want to send in SL and SH as arguments. This leads to the variant
SigmoidInit2.
Listing 18.62: Adding upper and lower bounds to the node function initialization
f u n c t i o n [ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( NodeSize , SL , SH , IN ,OUT)
%
% I n i t i a l i z e node t r a n s f e r f u n c t i o n s .
%
5 ....
a l l i s t h e same e x c e p t SL and SH a r e arguments s o
no need t o i n i t i a l i z e them i n t h e c o d e
....
end
18.4 Polishing the Training Code 373
We then have to alter the code for evaluation, energy calculation, updates and training
to reflect the new way we can access the node functions and their derivatives. First,
the evaluation. We took out the node function initialization and changed the argument
list.
% get sizes
s i z e I N = l e n g t h ( IN ) ;
21 sizeOUT = l e n g t h (OUT) ;
Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes . v ) ;
InVals = zeros (1 , sizeY ) ;
26 f o r i = 1: sizeY
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
31 lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
% add i n p u t i f t h i s i s an i n p u t node
for j = 1: sizeIN
36 i f i == IN{ j }
sum = sum + I ( j ) ;
end
end
f o r j = 1 : lenBEF
41 l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
I n V a l s ( i ) = sum ;
46 Y( i ) = si gma { i } ( sum , O( i ) ,G( i ) ) ;
end
OutVals = Y;
end
sum = 0 ;
f o r i =1:S
35 [ y I ( i , : ) , YI ( i , : ) ] = e v a l u a t i o n 3 ( sigma , g , I ( i , : ) , IN ,OUT,W, O, G, B , BE) ;
error = 0;
f o r j = 1 : sizeOUT
OI ( i , j ) = YI ( i ,OUT{ j } ) ;
EI ( i , j ) = YI ( i ,OUT{ j } ) − D( i , j ) ;
40 e r r o r = e r r o r + ( EI ( i , j ) ) ˆ 2 ;
end
sum = sum + e r r o r ;
end
E = . 5 ∗ sum ;
45
end
Next, we change the update code. In addition to removing the node function setup
code, we also streamlined the gradient norm calculation.
18.4 Polishing the Training Code 375
Nodes = g . v ;
24 [m, s i z e Y ] = s i z e ( Nodes ) ;
sizeW = l e n g t h (W) ;
S = length ( I ) ;
s i z e D = l e n g t h (D) ;
SN = l e n g t h ( IN ) ;
29 sizeOUT = l e n g t h (OUT) ;
xi = zeros (S , sizeY ) ;
DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;
DEDG = z e r o s ( 1 , s i z e Y ) ;
34
%f i n d maximum o u t p u t node
OMax = OUT{ 1 } ;
f o r i =2: sizeOUT
i f OUT{ i } > OMax
39 OMax = OUT{ i } ;
end
end
f o r alpha = 1:S
44 for i = OMax : − 1 : 1
% get forward edge i n f o r m a t i o n f o r neuron i
FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h (FEF) ;
49 %f i n d o u t i f i i s a t a r g e t node
for j = 1 : sizeOUT
if OUT{ j } == i
k = j;
IsTargetNode = 1 ;
54 end
end
376 18 Graph Models
ifI s T a r g e t N o d e == 1
x i ( a l p h a , i ) = (Y( a l p h a , i ) − D( a l p h a , k ) ) ;
else
59 adder = 0 ;
f o r j = 1 : lenFF
l i n k = FEF( j ) ;
% F ( j ) i s a node w h i c h i g o e s f o r w a r d t o
p o s t = FF ( j ) ;
64 a d d e r = a d d e r + x i ( a l p h a , p o s t ) ∗W( l i n k ) ∗ s i g m a p r i m e { p o s t } ( y (
a l p h a , p o s t ) ,O( p o s t ) ,G( p o s t ) ) ;
end
x i ( alpha , i ) = adder ;
end % t e s t on n o d e s : o u t p u t o r n o t
end % l o o p on n o d e s
69 end % l o o p on d a t a
%x i
f o r alpha = 1 : S
f o r i = s i z e Y : −1 : 1
74 % FE{ i } i s t h e g l o b a l node i n f o r m a t i o n
% where i t h e i t h w e i g h t
% FE{ i } = a s e t o f ( p r e , p o s t )
FF = F{ i } ;
FEF = FE{ i } ;
79 lenFF = l e n g t h ( FF ) ;
f o r j = 1 : lenFF
% get the weight index
l i n k = FEF( j ) ;
% W( l i n k ) = E { p r e −> p o s t } w h e r e
84 pre = i ;
p o s t = FF ( j ) ;
DEDW( l i n k ) = DEDW( l i n k ) + x i ( a l p h a , p o s t ) ∗ s i g m a p r i m e { p o s t } ( y ( a l p h a
, p o s t ) ,O( p o s t ) ,G( p o s t ) ) ∗Y( a l p h a , p r e ) ;
end% f o r w a r d l i n k l o o p
DEDO( i ) = DEDO( i ) − x i ( a l p h a , i ) ∗ s i g m a p r i m e { i } ( y ( a l p h a , i ) ,O( i ) ,G( i ) )
;
89 %DEDG( i ) = DEDG( i ) − x i ( a l p h a , i ) ∗ s i g m a p r i m e { i }( y ( a l p h a , i ) ,O( i ) ,G( i )
) ∗( y ( a l p h a , i )−O( i ) ) /G( i ) ;
end% node l o o p
end% e x e m p l a r l o o p
i f DOGAIN == 1
94 Grad = [DEDW DEDO DEDG ] ;
else
Grad = [DEDW DEDO ] ;
end
NormGrad = s q r t ( sum ( Grad . ∗ Grad ) ) ;
99 %LocMaxGrad = g e t a r g m a x ( Grad , . 0 0 1 ) ;
LocMaxGrad = 1 ;
gradtol = 0.05;
%
i f NormGrad < g r a d t o l
104 scale = r ;
else
s c a l e = r /NormGrad ;
end
%C s c a l e = s c a l e
109 % do u p d a t e s
WUpdate = W− DEDW ∗ s c a l e ;
OUpdate = O − DEDO ∗ s c a l e ;
%GUpdate = G − DEDG ∗ s c a l e ;
GUpdate = G;
114
end
We can see the results of our approximation in Fig. 18.6 by generating a test file.
Fig. 18.6 The approximation of cos(t) on [0,3.14] using a 1-5-1 FFN graph: error is 0.1
18.5 Comparing the CFFN and MFFN Code 379
Let’s take a moment to check to see if our two ways of building approximations using
network architectures give the same results. To do this, we will setup a particular
1-5-1 network as both a CFFN and a MFFN and calculate the starting energy and
one gradient step to see if they match. This is a bit harder than it sounds as the data
structures are a bit different. First, here is the setup. This code is careful to set up
both versions of the 1-5-1 problem the same. The parameters are chosen randomly
as before and we map the MFFN choices to their CFFN counterparts so that both
techniques start the same. We use simple node computation initialization functions
too. We added one for the MFFN which we show here.
s i g m a p r i m e = @( y , o f f s e t , g a i n ) ( 1 / 2 ) ∗ s e c h ( ( y − o f f s e t ) / g a i n ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {1} = s i g m a p r i m e ;
17 s i g m a i n p u t p r i m e = @( y , o f f s e t , g a i n ) 1 ;
n o d e f u n c t i o n p r i m e {2} = s i g m a i n p u t p r i m e ;
s i g m a o u t p u t p r i m e = @( y , o f f s e t , g a i n ) ( ( SH−SL ) / 2 ) ∗ s e c h ( ( y − o f f s e t ) /
gain ) . ˆ 2 ;
n o d e f u n c t i o n p r i m e {3} = s i g m a o u t p u t p r i m e ;
22 end
The function SigmoidInit2 does a similar job for the CFFN node function ini-
tializations. Note the code to set the parameters in the CFFN to match the MFFN
ones is a bit cumbersome as it has to been manually since the data structures don’t
match up nicely.
380 18 Graph Models
60 end
We finish this initialization code by doing the energy evaluations for both versions.
We show this result below. Note they are the same value so it seems the two energy
evaluations return the same number.
18.5 Comparing the CFFN and MFFN Code 381
Listing 18.72: The initial energy for the CFFN and MFFN code
EMFFN =
2.9173
ECFFN =
2.9173
Next, we test one gradient update. To do this, we created new update and train-
ing functions for both the CFFN and the MFFN which simply add diagnostic
material. These are the functions chainffntraindiagnostic, Gradient
Updatediagnostic, mffntraindiagnostic and mffnupdatediagno
stic. If you look at them, you can see the changes we made are pretty straight-
forward; we leave that exploration to you! The testing and the results are shown
next.
Listing 18.73: One gradient step for both CFFN and MFFN
%t r a i n mffn 1 s t e p
[ G, O, T ,EMFFN] = m f f n t r a i n d i a g n o s t i c ( Input , Target , n o d e f u n c t i o n ,
n o d e f u n c t i o n p r i m e , G, O, T , L a y e r S i z e s , y , Y, 0 . 0 0 0 5 , 1 ) ;
%T r a i n c f f n 1 s t e p
[WC,GC, OC, ECFFN, Norm , Loc ] = c h a i n f f n t r a i n d i a g n o s t i c ( sigma , sigmaprime ,
GG, Input , IN , Target ,OUT,WC, OC,GC, B , BE, F , FE , 0 . 0 0 0 5 , 1 ) ;
5 GradMFFN =
−1.3319 0.6797 −0.2617 0.1119 −0.4563 2.7344
4.2227 0.5844 0.0905 1.5443 0.6065
2.0345 −1.2520 0.2508 −0.3006 0.6916 −4.8874
NormGradMFFN =
7.8205
10 GradCFFN =
−1.3319 0.6795 −0.2618 0.1118 −0.4566 2.7349
4.2220 0.5844 0.0905 1.5442 0.6078
2.0354 −1.2515 0.2509 −0.3005 0.6919 −4.8868
NormGradCFFN =
7.8202
(a) (b) N9
E9→10 N 10
N6 N7 N8
E3→6 E5→8
E4→7
N3 N4 N5
E10→11
E2→7
E2→3 E1→7
E2→4 E2→5
N2 E11→2
E1→2 N 11
N1
Fig. 18.7 A Folded Feedback Pathway cortical circuit in two forms: the relabeled graph on the
right has external Input into Node 1 from the thalamus and into Node 9 from the cortical column
above. a The Folded Feedback Pathway DG. b The Relabeled Folded Feedback Pathway DG
where we can initialize the nodal outputs a variety of ways: for example, by calculat-
ing all the outputs without feedback initially. and hence, setting Y9 (0) = Y10 (0) = 0.
When we relabel this graph as shown in Fig. 18.7b, it is now clear this a feed
forward chain of computational nodes and the only feedback is the relabeled edge
E11→2 . The external input comes into N1 from the thalamus and into N9 from the
cortical column above it. The input for the nodal calculation for N2 is in terms of
time ticks on our clock
or
However we choose to implement the graph, there will be feedback terms and so
we must interpret the evaluation and update equations appropriately. Then, once a
graph structure is chosen, we apply the usual update equations in the form given by
Table 18.1a and the Hebbian updates using Table 18.1b. In this version of a Hebbian
update algorithm, we differ from the one we discussed earlier in this chapter. In the
18.6 Handling Feedback 383
I2 I4
I2 I2
O1 O2 O3
N9 N 20
17 18
Y Y Y 19
N6 N7 N 8 E9→10 N 17 N 18 N 19 E9→10
first version, we checked to see if the value of the post neuron is high and the value of
the edge Epre→post is also high, the value of this edge is increased by a multiplier ξ. To
check this, we just looked at the value of Y post × Epre→post and if it was high enough
we increased the weight Epre→post . Here, the idea is similar but this time we look at
the value of Y pre × Epre→post and if this values is high enough, we increase the edge.
There are other versions too. The idea is to somehow increase the edge weight when
the pre signal and the post signal are both high.
Another approach is to handle feedback terms using a lag as shown in Fig. 18.8.
We make two copies of the graph which are here drawn using nodes N1 through N11
on the left and these correspond to what is happening at time t. On the right is the copy
of the graph with the nodes labeled N12 to N22 which corresponds to time t + 1. The
feedback connection is then the link between N11 and N13 with edge weight E11→2 ;
hence the feedback has now become a feedforward link in this new configuration.
The nodes N12 and N22 retain the usual edge weights from time t as you can see
in the figure. The number of tunable parameters is still the same as in the original
384 18 Graph Models
I2 I4
I2 I2
O1 O2 O3
N 10 N 22
E9→10 N 21 E9→10
N9 E7→9 E8→9
E8→9 E E10→11
E7→9 10→11
E6→9
E6→9
N6 N7 N8 N 18 N 19 N 20
Fig. 18.9 Adding column to column feedback to the original Folded Feedback Pathway as a lagged
architecture
network with feedback but using this doubling procedure, we have converted the
network into a feedforward one. Once the graph is initialized, we apply evaluation
and update algorithms modified a bit to take into account this new doubled structure.
Note here there are inputs into Nodes N1 and N12 of value I1 and inputs into Nodes
N9 and N20 of value I2 . The outputs here are N17 , N18 and N19 rather than N6 , N7 and
N8 . For a given lagged problem, we need to decide how to handle this doubling of
the inputs and outputs.; we simply show the possibilities here. Now Fig. 18.8 shows
a portion of a cortical circuit where the OCOS subcircuit appears disconnected from
the FFP portion. Let’s make it a bit more realistic by adding a node N9 in the column
above this column circuit which is connected to N6 , N7 and N8 . The output from N9
then connects to the FFP originally given by nodes N9 , N10 and N11 . We will shift
the FFP nodes up by one making them N10 , N11 and N12 . We draw this in Fig. 18.9.
In this new architecture, the calculations through the copy that corresponds to time t
are generating the N12 output and the second copy of the network is necessary to
take into account the feedback from N12 to N2 .
Let’s look at some MatLab implementations of these ideas. First, we need to
redo how we parse the incidence matrix of a graph so that we find self and non self
feedback terms. Consider the new incidence code below. There are now two new
possibilities we focus on. For a given node ii, we set up the entries in its columns.
We loop through the edges using the index jj. Recall each edge consists of a in and
out pair. The term e2(1,jj) gives the in value of the edge. If in value matches
the given node ii, we check to see if the in value ii is less than the out value. If this
is true, this means this edge goes out from the node ii and so we set the incidence
18.6 Handling Feedback 385
Listing 18.74: The incidence matrix calculation for a CFFN with feedback
f u n c t i o n I n c i d M a t = i n c i d e n c e F B (G)
%
% g i s a graph having
% vertices g . v
5 % edges g . e
%
% Get t h e e d g e o b j e c t from g
E = G. e ;
% g e t t h e e d g e s from t h e e d g e o b j e c t
10 e2 = E . e ;
% g e t t h e v e r t i c e s o b j e c t from g
V = G. v ;
% g e t t h e v e r t i c e s from t h e v e r t i c e s o b j e c t
v2 = V. v ;
15 % f i n d o u t how many v e r t i c e s and e d g e s t h e r e are
[m, s i z e V ] = s i z e ( v2 ) ;
[m, s i z e E ] = s i z e ( e2 ) ;
IncidMat = z e r o s ( sizeV , s i z e E ) ;
%
20 % setup incidence matrix
%
f o r i i = 1: sizeV
for j j = 1: sizeE
i f e2 ( 1 , j j ) == i i
25 i f e2 ( 1 , j j ) < e2 ( 2 , j j )
IncidMat ( i i , j j ) = 1 ;
else
IncidMat ( i i , j j ) = 2 ;
end
30 e l s e i f e2 ( 2 , j j ) == i i
i f e2 ( 1 , j j ) < e2 ( 2 , j j )
I n c i d M a t ( i i , j j ) = −1;
else
I n c i d M a t ( i i , j j ) = −2;
35 end
end
end
end
end
40 end
Once we have the new incidence matrix, we need to convert this into a dot file for
printing. If we have self feedback, the sum of the entries in a given column will be
positive and so we set the to and from nodes of that edge to be the same. If the we
have a +1 and a −1 in the column for a given edge, this corresponds to our usual
forward edge. If we have a +2 and a −2 in the same column, this is a feedback edge.
386 18 Graph Models
Listing 18.75: Converting the incidence matrix for a CFFN with feedback into a dot
file
f u n c t i o n incToDotFB ( i n c , width , h e i g h t , a s p e c t , f i l e n a m e )
[ N, E ] = s i z e ( i n c ) ;
[ f i d , msg ] = f o p e n ( f i l e n a m e , ’w ’ ) ;
f p r i n t f ( f i d , ’ digraph G {\ n ’ ) ;
5 f p r i n t f ( f i d , ’ size =\"% f ,% f \";\ n ’ , width , h e i g h t ) ;
f p r i n t f ( f i d , ’ rankdir = TB ;\ n ’ ) ;
f p r i n t f ( f i d , ’ ratio =% f ;\ n ’ , a s p e c t ) ;
f o r i i = 1 :N
f p r i n t f ( f i d , ’% d [ shape =\" rectangle \" fontsize =20 fontname =\"
Times new Roman \" label =\"% d \"];\ n ’ , i i , i i ) ;
10 end
f p r i n t f ( f i d , ’\ n ’ ) ;
for j j = 1:E
15 a = 0;
b = 0;
f o r i i = 1 :N
switch inc ( i i , j j )
case 1
20 a = ii ;
c a s e −1
b = ii ;
case 2
a = ii ;
25 i f a b s ( sum ( i n c ( : , j j ) ) ) > 0
b = ii ;
end
c a s e −2
b = ii ;
30 end
end
i f ( a ˜= 0 ) && ( b ˜= 0 )
f p r i n t f ( f i d , ’%d - >% d ;\ n ’ , a , b ) ;
end
35 end
f p r i n t f ( f i d , ’\ n } ’ ) ;
fclose ( fid ) ;
end
Let’s see how this works on a modified OCOS that has some feedback and self
feedback.
This generates the graphic we see in Fig. 18.10. As an exercise, let’s setup the
OCOS/FFP with column to column interaction we have shown as a lagged FFN
in code. We start by constructing the original graph of this circuit with feedback.
We generate the plot as usual which is shown in Fig. 18.11. Note the incidence
matrix tells us where the feedback edges are: KG(12, 16) = 2 and KG(2, 16) = −2.
We need to extract the feedback edges and build the new graph which is two copies of
the old one. There is only one feedback edge here, so we find FBedge(1) = 16.
We now know the feedback edge is 16 and so we can remove it. We had to
write remove edge functions to do this. We show them next. There is the func-
tion subtract.m in the @edges directory that removes an edge and then we
use this function in the @graphs directory to remove an edge from the graph with
subtractedge function. We show these functions next.
We remove edge 16 and then copy the resulting graphs. We have not automated this
yet and we know the new edge goes from 12 to 14 so we add this new edge to the
double copy.
Listing 18.81: Subtract the feedback edge and construct the double copy
G = s u b t r a c t e d g e (G, FBedge ( 1 ) ) ;
G. e
3 ans =
1 2 2 2 3 4 5 2 1 6 7 8 9 10 11
2 3 4 5 6 7 8 7 7 9 9 9 10 11 12
W = addgraph (G, G, { [ 1 2 ; 1 4 ] } ) ;
W. v
8 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
W. e
1 2 2 2 3 4 5 2 1 6 7 8 9
2 3 4 5 6 7 8 7 7 9 9 9 10
13
10 11 13 14 14 14 15 16 17 14 13
11 12 14 15 16 17 18 19 20 19 19
18 19 20 21 22 23 12
18 21 21 21 22 23 24 14
KW = i n c i d e n c e F B (W) ;
incToDot (KW, 1 0 , 1 0 , 1 , ’ D o u b l e d O C O S F F P . dot ’ ) ;
The new dot file generates Fig. 18.12. We see it is fairly easy to construct the doubled
graph although it is clear a lot of work is required to automate this process. Also,
we will apply the standard evaluation, update and training algorithms for the CFFN
to this network and they need to be modified. First, only edges 1–15 and the old
feedback edge which is now edge 31 correspond to tunable weight values. Also the
offsets and gains of only nodes 1 through 12 are modifiable. Since we usually don’t
tune the gains, this network has 16 tunable edges and 12 tunable offsets for a total of
28 parameters. We will explore these architectures in the next volume which will be
devoted to building realistic models of cognitive dysfunction. Just for fun, let’s list
what we would need to do:
• Suppose the number of original nodes in the graph G is N and the edges for G
consist of M feedforward edges and Q feedback edges.
• We double G to the new graph W which has 2N nodes and 2P edges. We then add
the Q feedback edges as new feedforward edges from the original graph G to its
copy as you have seen us do in the example.
• We have to make a decision about the input and output nodes. If IN and OUT
are those sets, a reasonable thing to do is add the input nodes in the origi-
nal part of the doubled graph to the copy; i.e. NewIN = {IN, IN+N} and
to drop the output nodes in the original part and use those nodes in the copy;
i.e. NewOUT = OUT + N which adds N to each index in IN and OUT as
appropriate.
• The nodes N + 1 to 2N use the same node function as the original nodes so we
have to set them up that way.
• The only tunable parameters are the values for the edges 1 to P from the original
or the values from P + 1 to 2P in the copy and the new feedforward edges we
created 2P + 1 to 2P + Q and the offsets and gains for the original nodes 1 to N
or the offsets and gains in the copy. It makes more sense to do the updates in the
copy as we know there is a big problem with attenuation in the size of the partial
derivatives the farther we are removed from the raw target error. Hence, we will
update the link weights for P + 1 to 2P and the offsets and gains in the copy as
well. These updated values will then be used to set the parameter values in the
original graph part of the lagged graph.
∂E
• we adjust the update code so that we only calculate ∂E{j} for the appropriate j and
∂E ∂E
the ∂O(k) and ∂G(k) for the right k.
• We can then train as usual.
Once we have removed edges, we want to find and keep the list of edges removed
and the edge information itself. We do this with the functions ExtractFBEdges
and ExtractNewFFEdgeInfo. The function ExtractFBEdges stores the link
numbers associated with the feedback edges we find in the original graph.
Listing 18.84: Extracting the feedback edge indices from the incidence matrix
f u n c t i o n FBedge = ExtractFBEdges (KG)
%
% KG i s i n c o m i n g i n c i d e n c e m a t r i x
%
5 %
[ rows , c o l s ] = s i z e (KG) ;
FBedge = [ ] ;
f o r i = 1 : rows
for j = 1: cols
10 i f KG( i , j ) == 2
%e d g e j i s feedback
FBedge = [ FBedge , j ] ;
end
end
15 end
end
Each of the feedback edges has an integer pair of the form [a, b] where a > b because
it is a feedback edge. The function below sets up the new feedforward edges we need
in the lagged graph to have the form [a, b + N] where N is the number of nodes in
the original graph.
392 18 Graph Models
Listing 18.85: Extracting the new feedforward edges constructed from the old feed-
back ones
f u n c t i o n NewFFedges = ExtractNewFFEdgeInfo (G, L i s t )
%
3 % G i s o r i g i n a l graph with feedback
% L i s t i s l i s t o f FB e d g e s
%
NewFFedges = { } ;
M = length ( List ) ;
8 N = l e n g t h (G. v ) ;
ge = G. e ;
f o r i = 1 :M
NewFFedges{ i } = [ ge ( 1 , L i s t ( i ) ) ; ge ( 2 , L i s t ( i ) )+N ] ;
end
13
end
where we set the multiplier λi = 1 is the absolute value of the ith component of the
gradient is too small (say less than 0.01) and otherwise λi = 1/λmax where λmax is
the largest absolute value of the components of the gradient. This works reasonably
well as long as the gradient has components which are reasonably uniform in scale.
If the gradient has only a few components that are big, this will still result in a scaled
gradient that only updates a few components. We can make some general comments
about the code which are in the source, but here we have taken them out and put them
in the main body. First, this is a lagged graph so there are 2 copies of the original
graph giving us a lagged graph with nodes 1 to N from the original and nodes N + 1
to 2N from the copy. The node functions of the first N nodes are repeated for the
second set of nodes: so input nodes are in IN and IN+N. The output nodes are in
OUT and OUT+N and we use those output nodes to set the node functions for the
lagged graph. For training purposes, we do use the double input, IN, IN+N but the
outputs are just the nodes in the copy, OUT + N.
If the number of edges in the original graph was M we know M = P + Q, where P
were feedforward edges and Q were feedback edges which are redone as feedforward
edges in the lagged graph. The parameters W from P + 1 to 2P, and W from 2P + 1
to 2P + Q are updateable as well as the gains and offsets from N + 1 to 2N. So the
original code for the gradient updates must be altered to accommodate the fact that
half of the parameters of the lagged network are not to be updated. The ξ calculations
are the same as we want to include all the contributions from all nodes and links.
Note as usual we start the ξ loops at OMax, the largest node in the OUT set as the
nodes after that do not contribute to the energy. Next, we have the structure of W is
18.7 Lagged Training 393
In this code NGE is P and NFB is Q. We generate all partials and then set some of
them to zero like this. We use DEDW to be the vector of partials with respect to link
weights, DEDO to be partials for the offsets and DEDG, the partials for the gains.
Of course, it is more complicated if we use gradient scaling, but the idea is essentially
the same. The last part of the code implements gradient descent using either the raw
gradient information or the scaled gradients.
25 DOGAIN = 0 ;
Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes ) ;
sizeW = l e n g t h (W) ;
30 S = length ( I ) ;
s i z e D = l e n g t h (D) ;
SN = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
xi = zeros (S , sizeY ) ;
35 DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;
DEDG = z e r o s ( 1 , s i z e Y ) ;
%f i n d maximum o u t p u t node
40 OMax = OUT{ 1 } ;
f o r i =2: sizeOUT
i f OUT{ i } > OMax
OMax = OUT{ i } ;
end
45 end
%We s t a r t t h e x i l o o p s a t OMax, t h e l a r g e s t node i n t h e OUT s e t
%a s t h e n o d e s a f t e r t h a t do n o t c o n t r i b u t e t o t h e e n e r g y .
f o r alpha = 1 : S
f o r i = OMax : − 1 : 1
50 % g e t forward edge i n f o r m a t i o n f o r neuron i
FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h (FEF) ;
%f i n d o u t i f i i s a t a r g e t node
55 f o r j = 1 : sizeOUT
i f OUT{ j } == i
k = j;
IsTargetNode = 1 ;
end
60 end
i f I s T a r g e t N o d e == 1
x i ( a l p h a , i ) = (Y( a l p h a , i ) − D( a l p h a , k ) ) ;
else
adder = 0 ;
65 f o r j = 1 : lenFF
l i n k = FEF( j ) ;
% F ( j ) i s a node w h i c h i g o e s f o r w a r d t o
p o s t = FF ( j ) ;
a d d e r = a d d e r + x i ( a l p h a , p o s t ) ∗W( l i n k ) ∗ s i g m a p r i m e { p o s t } ( y (
a l p h a , p o s t ) ,O( p o s t ) ,G( p o s t ) ) ;
70 end
x i ( alpha , i ) = adder ;
end % t e s t on n o d e s : o u t p u t o r n o t
end % l o o p on n o d e s
end % l o o p on d a t a
75 %x i
f o r alpha = 1 : S
f o r i = OMax : −1 : 1
% FE{ i } i s t h e g l o b a l node i n f o r m a t i o n
80 % where i t h e i t h w e i g h t
% FE{ i } = a s e t o f ( p r e , p o s t )
FF = F{ i } ;
FEF = FE{ i } ;
lenFF = l e n g t h ( FF ) ;
85 f o r j = 1 : lenFF
% get the weight index
l i n k = FEF( j ) ;
% W( l i n k ) = E { p r e −> p o s t } w h e r e
pre = i ;
90 p o s t = FF ( j ) ;
DEDW( l i n k ) = DEDW( l i n k ) + x i ( a l p h a , p o s t ) ∗ s i g m a p r i m e { p o s t } ( y ( a l p h a
, p o s t ) ,O( p o s t ) ,G( p o s t ) ) ∗Y( a l p h a , p r e ) ;
end% f o r w a r d l i n k l o o p
18.7 Lagged Training 395
%u s u a l g r a d i e n t
105 i f DOGAIN == 1
Grad = [DEDW DEDO DEDG ] ;
else
Grad = [DEDW DEDO ] ;
end
110 NormGrad = s q r t ( sum ( Grad . ∗ Grad ) ) ;
%
%we i m p l e m e n t g r a d i e n t s c a l i n g
Scale Tol = 0.01;
WScale = o n e s ( 1 , l e n g t h (W) ) ;
115 O S c a l e = o n e s ( 1 , l e n g t h (O) ) ;
GS c a le = o n e s ( 1 , l e n g t h (G) ) ;
ADEDW = max ( a b s (DEDW) ) ;
ADEDO = max ( a b s (DEDO) ) ;
ADEDG = max ( a b s (DEDG) ) ;
120 f o r i = 1 : l e n g t h (W)
i f a b s (DEDW( i ) ) > S c a l e T o l
WScale ( i ) = 1/ADEDW;
end
end
125 f o r i = 1 : l e n g t h (O)
i f a b s (DEDO( i ) ) > S c a l e T o l
O S c a l e ( i ) = 1/ADEDO;
end
end
130 f o r i = 1 : l e n g t h (G)
i f a b s (DEDG( i ) ) > S c a l e T o l
GS ca le ( i ) = 1/ADEDG;
end
end
135 i f DOGAIN == 1
SW = d i a g ( WScale ) ∗DEDW’ ;
SO = d i a g ( O S c a l e ) ∗DEDO’ ;
SG = d i a g ( GScale ) ∗DEDG’ ;
S c a l e d G r a d = [SW’ SO ’ SG ’ ] ;
140 else
SW = d i a g ( WScale ) ∗DEDW’ ;
SO = d i a g ( O S c a l e ) ∗DEDO’ ;
S c a l e d G r a d = [SW’ SO ’ ] ;
end
145 gradtol = 0.05;
%
i f NormGrad < g r a d t o l
scale = r ;
%u p d a t e a l l t h e l i n k s .
150 WUpdate = W− DEDW ∗ s c a l e ;
%s e t c o p i e s t o u p d a t e s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;
160 %g a i n s 1 t o NGN
GUpdate = G − DEDG ∗ s c a l e ;
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
165 end
e l s e% normgrad i s l a r g e r t h a n 0 . 0 5
NormScaledGrad = s q r t ( sum ( S c a l e d G r a d . ∗ S c a l e d G r a d ) ) ;
%u p d a t e a l l t h e l i n k s .
WUpdate = W− r ∗DEDW∗ d i a g ( WScale ) ;
170 %s e t c o p i e s t o u p d a t e s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;
i f DOGAIN == 1
%u p d a t e a l l g a i n s . This does not a l t e r
180 %g a i n s 1 t o NGN
GUpdate = G − r ∗DEDG∗ d i a g ( E S c a l e ) ;
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
185 end
end
end
We need to test this code and see how it does. First, we setup the lagged graph.
18.7 Lagged Training 397
Note, we generate the original dot file so we can see the original graph. Then we get
the information we need to build the lagged graph.
Listing 18.90: Example: Further setup details and input and target sets
IN = { 1 , 1 3 } ;
OUT = { 6 , 7 , 8 , 9 , 1 8 , 1 9 , 2 0 , 2 1 } ;
NGN = l e n g t h (G. v ) ;
NGE = l e n g t h (G. e ) − l e n g t h ( FBedge ) ;
5 NFB = l e n g t h ( FBedge ) ;
N o d e S i z e = l e n g t h ( LaggedW . v ) ;
SL = − 0 . 5 ;
SH = 1 . 5 ;
%s e t u p t r a i n i n g
10 X = linspace (0 ,1 ,101) ;
U = [];
for i = 1:101
i f X( i ) >= 0 && X( i ) < 0 . 2 5
U = [U, [ 1 ; 0 ; 0 ; 1 ] ] ;
15 e l s e i f X( i ) >= 0 . 2 5 && X( i ) < 0 . 5 0
U = [U, [ 0 ; 1 ; 0 ; 0 ] ] ;
e l s e i f X( i ) >= 0 . 5 && X( i ) < 0 . 7 5
U = [U, [ 0 ; 0 ; 1 ; 1 ] ] ;
else
20 U = [U, [ 0 ; 0 ; 1 ; 0 ] ] ;
end
end
%r a n d o m l y p e r m u t e t h e o r d e r o f t h e i n p u t and t a r g e t set .
P = randperm ( 1 0 1 ) ;
398 18 Graph Models
25 XX = X(P) ;
UU = [ ] ;
for i = 1:101
j = P( i ) ;
UU = [UU,U ( : , j ) ] ;
30 end
I n p u t = [XX’ , XX ’ ] ;
T a r g e t = UU’ ;
%s e t a l l node f u n c t i o n s a s u s u a l ; t h e c o p i e s w i l l
%b e t h e same a s t h e o r i g i n a l
35 [ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( NodeSize , SL , SH , IN ,OUT) ;
We designed this input and output data with a more biological situation in mind. The
inputs are scalars and so are perhaps a light intensity value that has been obtained
from additional layers of neural circuitry. The outputs are taken from the top of the
OCOS/FFP stack and so are reminiscent of outputs from the top layer of visual cortex
(albeit in a very simplistic way) and the binary patterns of the target—i.e. [1; 0; 0; 1]
might represent a coded command to active a certain motor response or a set of
chromatophore activations resulting in a specific pattern on the skin of a cephalopod.
So this circuit, if we can train it to match the inputs to outputs, is setup to code for
four specific motor or chromatophore activations. The feedback edges 11 → 3 and
7 → 7 are not really biological and were added just to help provide a nice test for the
code. However, the edge 12 → 2 is part of the usual FFP circuit and so is biologically
plausible. The top of the OCOS was allowed to feedforward into a summation node 9
which then fed up to node 10 in the cortical column above the one which contains our
OCOS/FFP. That node 10 then provides feedback from the top cortical column into
the bottom of our OCOS/FFP stack. So this example is a small step in the direction
of modeling something of interest in cognitive modeling!
We list the regular gradient and the scaled gradient. Here there are 15 original feed-
forward edges and the first 15 values in Grad are therefore 0. The next 15 correspond
to the link weights for the copy and the last 3 are the link weights for the rewritten
feedback edges. The values for 28, 29 and 30 are 0 because those edges are after the
last output node and so are irrelevant to the energy calculation. Then, there are the
3 partials for the feedback link weights. Finally, there are then 12 offset partials that
are zero as they are in the original graph and 9 offsets partials from the copy that are
nonzero with the last 3 zero as they are irrelevant. The listing below that is the scaled
version of the gradient.
12 ScaledGrad =
Columns 1 t h r o u g h 15
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0
Columns 16 t h r o u g h 30
−0.1839 0.0462 0.0509 0.0280 −1.0000 0.1275
−0.2977 0.1503 −0.0193 −0.0217 0.2114 −0.1962
0 0 0
400 18 Graph Models
17 Columns 31 t h r o u g h 45
0.2297 0.0687 0.0949 0 0 0
0 0 0 0 0 0 0
0 0
Columns 46 t h r o u g h 57
−1.0000 0.0413 −0.0654 −0.1435 −0.1220 0.1786
−0.0359 0.0855 −0.0752 0 0 0
Finally, let’s talk about thresholding and recognition. A given target here has binary
values in it and the output from the model is not going to look like that. For example,
after some training, we might have a model output of [0.85; −0.23; 0.37; 0.69] with
the target supposed to be [1; , 0; , 0; 1]. To see if the model has recognized or classified
this input, we usually take the model output and apply a threshold function. We set low
and high threshold values; say cL = 0.4 and cH = 0.6 and perform the test Vi = 1 if
Yi > cH , Vi = 0 if Yi < cL and otherwise Vi = 0.5 where V represents the thresholded
output of the model. Then if the error between the thresholded value V and the target
is 0, we can say we have classified this input correctly. We do this in code with a pair of
functions. First, we write the code to do the thresholding.
Then, we want to run the thresholding algorithm over all our results. This is done in
GetRecognition.
Listing 18.96: Finding out how many inputs have been classified
f u n c t i o n [ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = G e t R e c o g n i t i o n (OUT, Y,
Target , LoTol , HiTol )
%
%
%
5 [ rows , c o l s ] = s i z e (Y) ;
R = z e r o s ( rows , l e n g t h (OUT) ) ;
Z = z e r o s ( rows , l e n g t h (OUT) ) ;
E r r o r = z e r o s ( rows , l e n g t h (OUT) ) ;
S u c c e s s = z e r o s ( rows , 1 ) ;
10 k = 1;
f o r j = 1 : l e n g t h (OUT)
for i = 1: cols
i f i == OUT{ j }
%we h a v e a column t h a t i s an o u t p u t
15 Z ( : , k ) = Y( : , i ) ;
k = k +1;
break ;
end
end
18.7 Lagged Training 401
20 end
%Z now c o n t a i n s t h e c o l s t h a t match
% targets
f o r i = 1 : rows
R( i , : ) = R e c o g n i t i o n ( Z ( i , : ) , LoTol , HiTol ) ;
25 E r r o r ( i , : ) = (R( i , : ) − T a r g e t ( i , : ) ) . ˆ 2 ;
t e s t = sum ( E r r o r ( i , : ) ) ;
i f t e s t == 0
Success ( i ) = 1;
end
30 end
sumE = 0 . 5 ∗ ( norm ( E r r o r , ’ fro ’ ) ) ˆ 2 ;
C l a s s i f i e d = sum ( S u c c e s s ) / rows ∗ 1 0 0 ;
end
For example, after suitable training, we might have results like this
Target ( 9 7 : 1 0 1 , : )
12
ans =
0 0 1 1
1 0 0 1
17 0 0 1 1
1 0 0 1
1 0 0 1
Recall Z here is the four outputs we are monitoring. We see entry 97 maps to [0; 0; 1; 0]
which does not match Target 97 but the output 98 does map to the correct output.
It doesn’t take long before we all get annoyed with gradient descent. We have played a
bit with making it better by adding a type of gradient scaling, but we can do one more
trick. Let’s look at a technique called line search. We have a function of many variables
here called E. For convenience, let all the parameters be called the vector p. Any vector
D for which the derivative of E in the direction of the vector D, ∇E · D < 0 is a direction
in which the value of E goes down is called a descent vector. Our various strategies for
402 18 Graph Models
pickingvectorsbasedon∇E(pold )allgiveusdescentvectors.Lookataslicethroughthe
multidimensionalsurfaceofEgivenbyg(ξ) = E(pold + ξD)whereDneednotbeaunit
vector. At a given step in our gradient descent algorithm, we know g(0) = E(pold ) and
if λ is our current choice of step to try, we can calculate g(λ) = E(pold + λD). From the
chainruleofcalculusofmorethanonevariable—ha,youwillhavetorememberwhatwe
discussed in Peterson (2015)—we find g (ξ) =< ∇E(pold + ξD), D >. Hence, since
we know the gradient at the current step, we see g (0) =< ∇E(pold ), D >. We thus
have enough information to fit a quadratic approximation of g(ξ) given by g approx (ξ) =
A + Bξ + Cξ 2 . We easily find
A = E(pold )
B = − < ∇E(pold ), D >
E(pold + λD) − E(pold ) − B λ
C=
λ2
Thenewversionofthecodehasbeenreorganizedabitandisadropinreplacementfor
the old code. It has a few toggles you can set in the code: if you set dolinesearch = 0
and leave dogradscaling = 1, you get the old version of the gradient update code
for the lagged networks.
The structure of the code has been reorganized as we show in the skeleton here.
18 % s e t UseGrad
UseGrad = S c a l e d G r a d ;
else
% without gradient scaling ,
% we u s e t h e o r i g i n a l g r a d i e n t
23 UseGrad = Grad ;
end%d o g r a d s c a l i n g l o o p
%now g e t norm o f g r a d i e n t
gradtol = 0.05;
28 NormGrad = s q r t ( sum ( UseGrad . ∗ UseGrad ) ) ;
%now do d e s c e n t s t e p
i f d o g r a d s c a l i n g == 1
% do a d e s c e n t s t e p u s i n g l a m b d a = s c a l e
33 % using e it h er the o r i g i n a l descent vector
% or i t s n o r m a l i z a t i o n
i f NormGrad < g r a d t o l
%n o t n o r m a l i z e d
NormDescent = NormGrad ;
38 e l s e% normgrad i s l a r g e r t h a n 0 . 0 5
%n o r m a l i z e d
NormDescent = 1 ;
end
e l s e% n o t u s i n g g r a d s c a l i n g
43 %
S c a l e d G r a d = Grad ;
i f NormGrad < g r a d t o l
%n o t n o r m a l i z e d
NormDescent = NormGrad ;
48 e l s e%
% normalized
NormDescent = 1 ;
end
end
53
%now we can do a l i n e s e a r c h
i f d o l i n e s e a r c h == 1
dolinesearchstep = 1;
%f i n d t h e e n e r g y f o r t h e f u l l s t e p o f l a m b d a = s c a l e
58 [ E F u l l , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D, WUpdate , OUpdate ,
GUpdate , B , BE) ;
%s e t A, BCheck = B and C a s d i s c u s s e d i n t h e t h e o r y o f t h e
quadratic approximation
%o f t h e s l i c e
A = EStart ;
BCheck = −d o t ( BaseGrad , UseGrad ) ;
63 C = ( E F u l l − A − BCheck∗ s c a l e ) / ( s c a l e ˆ 2 ) ;
%g e t t h e o p t i m a l s t e p
l a m b d a s t a r = −BCheck / ( 2 ∗C) ;
% i f C i s n e g a t i v e t h i s c o r r e s p o n d s t o a maximum s o r e j e c t
i f (C<0 | | l a m b d a s t a r > s c a l e )
68 % we a r e g o i n g t o a maximum on t h e l i n e s e a r c h ; r e j e c t
l a m b d a s t a r = l a m b d a s t a r +2∗ s c a l e ;
%d o l i n e s e a r c h s t e p = 0 ;
end
%now we h a v e t h e o p t i m a l l a m b d a t o u s e
73 i f d o l i n e s e a r c h s t e p == 1
%do new s t e p u s i n g l a m d a s t a r
s c a l e = lambdastar ;
%do i t d i f f e r e n t l y i f d o i n g g r a d i e n t s c a l i n g o r n o t
...
78 end%d o l i n s e a r c h s t e p l o o p
end% d o l i n e s e a r c h l o o p
%c o p y u p d a t e s t o t h e o r i g i n a l b l o c k o f p a r a m e t e r s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;
83 OUpdate ( 1 , 1 :NGN) = OUpdate ( 1 ,NGN+1:2∗NGN) ;
404 18 Graph Models
i f DOGAIN == 1
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
88 end
We can try out our new version of GradientUpdateLag which includes line
search and see how it does. Note it still takes quite a while, but without line search the
iteration count can reach 50,000 or more and you reach about 50 % recognition and
stall. The example before used the odd feedbacks of 11 -> 3 and 7 -> 7 which we will
now remove for this second example. Also, we remove the output node 10 as it is not
very biological. Other than that, the setup is roughly the same, so we only show the
construction of the original graph in the results below. We start with a ten step run using
gradient scaling and line search. We print the optimal line search step size to show you
how it is going. Then we do another and also print the initial recognition results.
lambdastar = 0.0600
lambdastar = 0.0620
lambdastar = 0.0645
lambdastar = 0.0678
40 lambdastar = 0.0724
lambdastar = 0.0790
lambdastar = 0.0891
lambdastar = 0.1057
E s t a r t = 1 4 9 . 2 7 6 Es to p 1 2 8 . 1 8 3 0
45
%a n o t h e r 100 s t e p s
[W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG , 0 . 1 , . . .
50 1 0 0 ,NGN,NGE,NFB) ;
E s t a r t = 1 2 1 . 7 9 2 1 Es top = 2 0 . 4 4 1 2
%a n o t h e r 100 s t e p s : e n e r e g y o s c i l l a t e s some
[W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
55 c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG , 0 . 1 , . . .
1 0 0 ,NGN,NGE,NFB) ;
Estart = 20.4315 Es to p = 2 0 . 1 7 9 1
60 %a n o t h e r 100 s t e p s : e n e r g y o s c i l l a t e s some
%c u t s c a l e t o . 0 1
[W, G, O, Energy , Grad , ScaledGrad , Norm ] = . . .
c h a i n f f n t r a i n l a g ( sigma , sigmaprime , LaggedW , Input , IN , . . .
Target ,OUT,W, O, G, BG, BEG, FG, FEG , 0 . 0 1 , . . .
65 1 0 0 ,NGN,NGE,NFB) ;
Estart = 20.1533 Es to p = 2 0 . 0 6 5 2
%c h e c k r e c o g n i t i o n
[ E , yI , YI , OI , EI ] = . . .
70 e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,BEG) ;
[ E r r o r , sumE , R, Z , S u c c e s s , C l a s s i f i e d ] = G e t R e c o g n i t i o n (OUT, YI , Target
,0.4 ,0.6) ;
C l a s s i f i e d , sumE
C l a s s i f i e d = 39.6040
sumE = 2 1 . 4 0 6 2
Now run for an additional 11,000 steps after which we check recognition and we see
we are about 50 %.
This is clearly not very good. We are using excessive iteration and still not getting good
recognition. Let’s revisit the gradient update algorithm and line search and see if we
can improve it.
406 18 Graph Models
Let’s tackle the training code by making it more modular. We are going to use the
functions GradientUpdateLagTwo to handle the gradient stuff and chainffn
trainlagtwo for the iterative training. We rewrote the update code as seen below;
some portions are left out as they are identical. We set up the gradient vectors based on
whether we are doing gradient scaling or not with a nice block.
We use the new function GetScaledGrad to find the scaled gradient vector. This
function contains all the code we used before, but by pulling it out we make the gradient
update code easier to follow and debug.
30 SW = d i a g ( WScale ) ∗DEDW’ ;
SO = d i a g ( O S c a l e ) ∗DEDO’ ;
S c a l e d G r a d = [SW’ SO ’ ] ;
end
UseGrad = S c a l e d G r a d ;
35
end
Once we have found the appropriate gradient, we must find the updates. If we don’t
do a line search, we simply do a usual descent step using our descent vector. We have
placed the descent code into two new function DoDescentStepGradScaling
which handles the scaled gradient case and DoDescentStepRegular which uses
just a straight gradient without scaling.
Itismucheasiertoseethestructureofthecodenowasallofthecomplicatedcomputations
are placed in their own modules. These modules are lifted straight from the old update
code. First, look at the scaled descent.
26 end
Note the regular descent is much simpler. Before all of this mess was in the main code
block which was making the gradient update code hard to read.
24 end
Finally, we want to tackle the line search code. The previous version did not take
advantage of what we did at a current line search step and instead kept resetting the
initial step size. So for example, we could run 2000 steps and each one would start line
search at whatever λ value we used as the start. This meant we did not take advantage of
what we were learning. So now we will return two things from our update code when
we use line search. First, the current value of lambdastar we find and second, a
18.9 Improved Gradient Descent 409
minimum value of the line search value. If we used the current lambdastar we find
to reset the next iteration, we could progressively shrink our calculated lambdastar
to zero and effectively stop our progress. So we return a good estimate of how small
we want our new start of the line search to be in the variable lambdastart. The
new line search looks the same in principle. The only difference is we set the value of
lambdastart at the end using the value of lambdamin. This will prevent our line
search from using too small of search values.
%s e t up w e i g h t and d e r i v a t i v e vectors
10 Nodes = g . v ;
[m, s i z e Y ] = s i z e ( Nodes ) ;
sizeW = l e n g t h (W) ;
S = length ( I ) ;
s i z e D = l e n g t h (D) ;
15 SN = l e n g t h ( IN ) ;
sizeOUT = l e n g t h (OUT) ;
xi = zeros (S , sizeY ) ;
DEDW = z e r o s ( 1 , sizeW ) ;
DEDO = z e r o s ( 1 , s i z e Y ) ;
20 DEDG = z e r o s ( 1 , s i z e Y ) ;
% f i n d the xi ’ s
. . . . code here
25 %f i n d t h e p a r t i a l s
. . . . . code here
%g e t b a s e g r a d i e n t
35 i f DOGAIN == 1
BaseGrad = [DEDW DEDO DEDG ] ;
else
BaseGrad = [DEDW DEDO ] ;
end
40 NormBase = s q r t ( sum ( BaseGrad . ∗ BaseGrad ) ) ;
410 18 Graph Models
ScaleTol = 0.01;
i f d o g r a d s c a l i n g == 1
[ WScale , OScale , GScale , UseGrad ] = GetScaledGrad (DOGAIN, S c a l e T o l ,W, O,
G,DEDW,DEDO,DEDG) ;
45 else
UseGrad = BaseGrad ;
end
NormGrad = s q r t ( sum ( UseGrad . ∗ UseGrad ) ) ;
50 gradtol = 0.05;
%now do d e s c e n t s t e p
i f d o g r a d s c a l i n g == 1
[ WUpdate , OUpdate , GUpdate ] = D o D e s c e n t S t e p G r a d S c a l i n g (DOGAIN,
NormBase , NormGrad ,W, O, G , . . .
WScale , OScale , GScale ,DEDW,DEDO,DEDG, g r a d t o l , r ) ;
55 else
[ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
D o D e s c e n t S t e p R e g u l a r (DOGAIN, NormGrad ,W, O, G,DEDW,DEDO,DEDG, g r a d t o l ,
r) ;
end
60 %l i n e s e a r c h
lambdamin = 0 . 5 ;
i f d o l i n e s e a r c h == 1
[ E S t a r t , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D,W, O, G, B , BE) ;
[ E F u l l , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , g , I , IN ,OUT, D, WUpdate , OUpdate
, GUpdate , B , BE) ;
65 A = EStart ;
B = −d o t ( BaseGrad , UseGrad ) ;
C = ( E F u l l − A − B∗ r ) / ( r ˆ 2 ) ;
l a m b d a s t a r = −B/ ( 2 ∗C) ;
i f (C<0)
70 % we a r e g o i n g t o a maximum on t h e l i n e s e a r c h ;
l a m b d a s t a r = l a m b d a s t a r +2∗ r ;
end
%do new o p t i m a l s t e p
i f d o g r a d s c a l i n g == 1
75 [ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
D o D e s c e n t S t e p G r a d S c a l i n g (DOGAIN, NormBase , NormGrad ,W, O, G, WScale ,
OScale , GScale , . . .
DEDW,DEDO,DEDG, g r a d t o l , l a m b d a s t a r ) ;
e l s e% n o t u s i n g g r a d s c a l i n g
[ WUpdate , OUpdate , GUpdate , NormDescent ] = . . .
80 D o D e s c e n t S t e p R e g u l a r (DOGAIN, NormGrad ,W, O, G,DEDW,DEDO,DEDG,
gradtol , lambdastar ) ;
end
% don ’ t r e t u r n t o o s m a l l a l a m b d a s t a r
i f l a m b d a s t a r < lambdamin
l a m b d a s t a r t = lambdamin ;
85 else
lambdastart = lambdastar ;
end
else
lambdastar = r ;
90 lambdastart = lambdastar ;
end
%c o p y u p d a t e s t o t h e o r i g i n a l b l o c k o f p a r a m e t e r s
WUpdate ( 1 , 1 :NGE) = WUpdate ( 1 ,NGE+1:2∗NGE) ;
95 OUpdate ( 1 , 1 :NGN) = OUpdate ( 1 ,NGN+1:2∗NGN) ;
18.9 Improved Gradient Descent 411
i f DOGAIN == 1
GUpdate ( 1 , 1 :NGN) = GUpdate ( 1 ,NGN+1:2∗NGN) ;
else
GUpdate = G;
100 end
end
The new training code is very similar although some of the return variables are
different. Note we have added code to do a plot of our calculated line search values as
well as the energy using the subplot command in MatLab.
scale = [ ] ;
Energy = [ Energy E ] ;
l a m b d a s t a r t = lambda ;
10 f o r t = 1 : NumIters
[W, O, G, Grad , NormBase , NormGrad , lambda , l a m b d a s t a r t ] =
GradientUpdateLagTwo ( sigma , sigmaprime , . . .
g , y , Y,W, O, G, B , BE, F , FE , Input , IN , Target ,OUT, l a m b d a s t a r t ,NGN,NGE,
NFB) ;
[ E , y , Y, OI , EI ] = e n e r g y 2 ( sigma , g , Input , IN ,OUT, Target ,W, O, G, B , BE) ;
Energy = [ Energy E ] ;
15 s c a l e = [ s c a l e lambda ] ;
end
Energy ( 1 ) , Energy ( NumIters )
s u b p l o t ( 2 , 1 , 1 ) , p l o t ( Energy ) ;
subplot (2 ,1 ,2) , plot ( scale ) ;
20 end
Let’s try this out on a model. The model is similar to what we have been testing our code
on. We changed the training data a bit and altered the node function setup some. We’ll
let you compare the examples and find the changes.
10 %F i n d t h e f e e d b a c k e d g e s and s u b t r a c t them
FBedge = ExtractFBEdges (KG) ;
NewFFedges = ExtractNewFFEdgeInfo (G, FBedge ) ;
15 W = r e m o v e e d g e s (G, FBedge ) ;
%d o u b l e t h e g r a p h
LaggedW = addgraph (W,W, NewFFedges ) ;
KLaggedW = i n c i d e n c e ( LaggedW ) ;
20
IN = { 1 , 1 3 } ;
OUT = { 6 , 7 , 8 , 1 8 , 1 9 , 2 0 } ;
NGN = l e n g t h (G. v ) ;
NGE = l e n g t h (G. e ) − l e n g t h ( FBedge ) ;
25 NFB = l e n g t h ( FBedge ) ;
N o d e S i z e = l e n g t h ( LaggedW . v ) ;
SL = − 0 . 2 ;
SH = 1 . 2 ;
30 %s e t u p t r a i n i n g
X = linspace (0 ,3 ,101) ;
U = [];
for i = 1:101
i f X( i ) >= 0 && X( i ) < . 3 3
35 U = [U, [ 1 ; 0 ; 0 ] ] ;
e l s e i f X( i ) >= . 3 3 && X( i ) < . 6 7
U = [U, [ 0 ; 1 ; 0 ] ] ;
else
U = [U, [ 0 ; 0 ; 1 ] ] ;
40 end
end
I n p u t = [XX’ , XX ’ ] ;
T a r g e t = UU’ ;
55 %s e t a l l node f u n c t i o n s a s u s u a l ; t h e c o p i e s w i l l
%b e t h e same a s t h e o r i g i n a l
[ sigma , s i g m a p r i m e ] = S i g m o i d I n i t 2 ( NodeSize , SL , SH , IN ,OUT) ;
IN = { 1 , 1 3 } ;
60 OUT = { 1 8 , 1 9 , 2 0 } ;
%r e s e t node s i z e t o t h a t o f G
E d g e S i z e = l e n g t h (G. e ) ;
OL = − 1 . 4 ;
65 OH = 1 . 4 ;
GL = 0 . 8 ;
GH = 1 . 2 ;
WL = − 2 . 1 ;
WH = 2 . 1 ;
70 O f f s e t = OL+(OH−OL) ∗ rand ( 1 ,NGN) ;
O = [ Offset , Offset ] ;
Gain = GL+(GH−GL) ∗ rand ( 1 ,NGN) ;
G = [ Gain , Gain ] ;
Weights = WL+(WH−WL) ∗ rand ( 1 , E d g e S i z e ) ;
18.9 Improved Gradient Descent 413
75 W = [ Weights , Weights ] ;
[ BG, FG, BEG, FEG ] = B F s e t s ( LaggedW , KLaggedW ) ;
[ I n V a l s , OutVals ] = e v a l u a t i o n 3 ( sigma , LaggedW , I n p u t ( 1 , : ) , IN ,OUT,W, O, G,
BG,BEG) ;
%f i n d e r r o r
[ E , yI , YI , OI , EI ] = e n e r g y 2 ( sigma , LaggedW , Input , IN ,OUT, Target ,W, O, G, BG,
BEG) ;
80 E
Only 200 steps and 85 % recognition!! We are doing much better with the new line
search code.
We might be able to get to 100 % but this is enough to show you how the training
works. Finally, to see how our model works on new data, consider the following test.
We set up new input and corresponding target data. These inputs are created using a
414 18 Graph Models
different linspace command so the inputs will not be the same as before. You can
see recognition is still fine.
I n p u t = [XX’ , XX ’ ] ;
T a r g e t = UU’ ;
Note on the training data, we classified 91.0891 % which is 92 out of the original 101
and on the testing, we got 89.23 % which is 58 out of 65 correct. For many purposes,
this level of recognition is pretty good.
References
J. Bower, D. Beeman, The Book of Genesis: Exploring Realistic Neural Models with the GEneral
NEural SImulation System, 2nd edn. (Springer TELOS, New York, 1998)
D. Harland, R. Jackson, Portia perceptions: the umwelt of an sraneophagic jumping spider, in Complex
Worlds from Simpler Nervous Systems, ed. by F. Prete (MIT Press, Cambridge, 2004), pp. 5–40.
A Bradford Book
M. Hines, Neuron (2003). http://www.neuron.yale.edu
K. Kral, F. Prete, In the mind of a hunter: the visual world of the Praying Mantis, in Complex
Worlds from Simpler Nervous Systems, ed. by F. Prete (MIT Press, Cambridge, 2004), pp. 75–116.
A Bradford Book
S. North, E. Ganser, Graphviz: Graph Visualization Software (AT & T, 2010). http://www.graphviz.
org
References 415
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd,
Singapore, 2015 In press)
R.Raizada,S.Grossberg,Towardsatheoryofthelaminararchitectureofcerebralcortex:computational
clues from the visual system. Cereb. Cortex 13, 100–113 (2003)
S. Zhang, M. Srinivasan, Exploration of cognitive capacity in honeybees: higher functions emerge
from small brain, in Complex Worlds from Simpler Nervous Systems, ed. by F. Prete (MIT Press,
Cambridge, 2004), pp. 41–74. A Bradford Book
Chapter 19
Address Based Graphs
In order to create complicated directed graph structures for our neural modeling,
we need a more sophisticated graph object. Create a new folder called Graphs
and in it create the usual @graphs, @vertices and @edges subfolders. We
will then modify the code we had in the previous @edges and @vertices to
reflect our new needs. Let’s look at how we might build a cortical column. Each
can in a cortical column is a six layer structure which consists of the OCOS,
FFP and Two/Three building blocks. The OCOS will have addresses [0; 0; 0; 0;
0; 1 − 7] as there are 7 neurons. The FFP is simpler with addresses [0; 0; 0; 0; 0; 1 −
2] as there are just 2 neurons, The six neuron Two/Three circuit will have 6
addresses, [0; 0; 0; 0; 0; 1 − 6]. To distinguish these neurons in different circuits
from one another, we add a unique integer in the fifth component. The addresses
are now OCOS ([0; 0; 0; 0; 1; 1 − 7]), OCOS ([0; 0; 0; 0; 2; 1 − 2]) and Two/Three
([0; 0; 0; 0; 3; 1 − 6]), So a single can would look like what is shown in Eq. 19.1.
We are not showing the edges here, for convenience. We can then assemble cans
into columns by stacking them vertically. This leads to the structure in Eq. 19.2. The
fourth component of each address now indicates the can number: 1, 2 or 3. We also
have to add a link between cans that provides the vertical processing.
We can label the address of this column as [0; 0; 1; ·; ·; ·]. We could then use this as
a model of cortex or assemble a rectangular sheet of columns. The resulting model
of cortex would then be addresses as [0; 1; ·; ·; ·; ·]. Eventually, we will have at
least 7 modules in our small brain model. We will the following address scheme.
Then the small brain model itself would have the address [1; ·; ·; ·; ·; ·]. To use
the brain model, we add an input and output module and construct a full model as
follows:
Input [1; ·; ·; ·; ·; ·]
Brain [2; ·; ·; ·; ·; ·]
Output [3; ·; ·; ·; ·; ·]
We can also construct an asymmetric brain model consisting of a left and right half
brain connected by a corpus callosum model. This would be addressed as follows:
Input [1; ·; ·; ·; ·; ·]
Left Brain [2; ·; ·; ·; ·; ·]
Corpus Callosum [3; ·; ·; ·; ·; ·]
Right Brain [4; ·; ·; ·; ·; ·]
Thalamus [5; ·; ·; ·; ·; ·]
MidBrain [6; ·; ·; ·; ·; ·]
Cerebellum [7; ·; ·; ·; ·; ·]
Output [8; ·; ·; ·; ·; ·]
Input [1; ·; ·; ·; ·; ·]
Left Brain [2; ·; ·; ·; ·; ·]
Sensory Model One [2; 1; ·; ·; ·; ·]
Sensory Model Two [2; 2; ·; ·; ·; ·]
Associative Cortex [2; 3; ·; ·; ·; ·]
Motor Cortex [2; 4; ·; ·; ·; ·]
Corpus Callosum [3; ·; ·; ·; ·; ·]
Right Brain [4; ·; ·; ·; ·; ·]
Sensory Model One [4; 1; ·; ·; ·; ·]
Sensory Model Two [4; 2; ·; ·; ·; ·]
Associative Cortex [4; 3; ·; ·; ·; ·]
Motor Cortex [4; 4; ·; ·; ·; ·]
Thalamus [5; ·; ·; ·; ·; ·]
MidBrain [6; ·; ·; ·; ·; ·]
Cerebellum [7; ·; ·; ·; ·; ·]
Output [8; ·; ·; ·; ·; ·]
420 19 Address Based Graphs
For example, if we used a single column with three cans for a cortex module, we
could add a simple thalamus model using one reversed OCOS circuit as shown in
Eq. 19.3. In this picture, we have addressed the thalamus as [0; 5; 0; 1; 1; 1 − 7] but
depending on how we set up the full model, this could change.
We see can 1 is one OCOS, one FFP and one Two/Three for 15 neurons and so
here, a three can column has 45 neurons.
We need to extend our graph class implementation so that we can use six dimen-
sional vectors as addresses. We also want to move seamlessly back and forth from
the global node numbers needed for the incidence matrix and the Laplacian and the
full address information. Let’s start with the vertices or nodes of the graph.
19.1 Graph Class Two 421
20 o u t ( i ) . number = V. n ( i ) . number ;
out ( i ) . a d d r e s s = V. n ( i ) . a d d r e s s ;
end
else
e r r o r (” i n v a l i d property f o r v e r t i c e s o b j e c t ”) ;
25 end
otherwise
e r r o r (” @ v e r t i c e s / s u b s r e f : i n v a l i d s u b s c r i p t type f o r v e r t i c e s ”)
;
end
end
We can add a node one at a time. Again, this inefficiency arises from the fact that we
can’t simply glue the incoming data to the end of the existing node list.
Listing 19.3: Add a single node with address based vertices add
1 f u n c t i o n W = add (V, u )
%
% node i s a d d e d t o v e r t e x l i s t
%
s = struct () ;
6 W = V. v ;
n = l e n g t h (W) ;
f o r i =1:n
s . node5 = W( i ) . node5 ;
s . node4 = W( i ) . node4 ;
11 s . node3 = W( i ) . node3 ;
s . node2 = W( i ) . node2 ;
s . node1 = W( i ) . node1 ;
s . node0 = W( i ) . node0 ;
o u t { i } = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
16 end
s . node5 = u ( 1 ) ;
s . node4 = u ( 2 ) ;
s . node3 = u ( 3 ) ;
s . node2 = u ( 4 ) ;
21 s . node1 = u ( 5 ) ;
s . node0 = u ( 6 ) ;
o u t {n+1} = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
W = v e r t i c e s ( out ) ;
end
Given that we can’t concatenate, it would be more efficient to add lists of node
simultaneously.
15 s . node0 = W( i ) . node0 ;
o u t { i } = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
end
for i = 1: sizenodes
s . node5 = n o d e s { i } ( 1 , 1 ) ;
20 s . node4 = n o d e s { i } ( 2 , 1 ) ;
s . node3 = n o d e s { i } ( 3 , 1 ) ;
s . node2 = n o d e s { i } ( 4 , 1 ) ;
s . node1 = n o d e s { i } ( 5 , 1 ) ;
s . node0 = n o d e s { i } ( 6 , 1 ) ;
25 o u t {n+i } = [ s . node5 ; s . node4 ; s . node3 ; s . node2 ; s . node1 ; s . node0 ] ;
end
W = v e r t i c e s ( out ) ;
end
We now have edges which are pairs of six dimensional addresses. Here is the con-
structor.
10 f o r i =1: s i z e e d g e s
s . i n = F{ i } ( : , 1 ) ;
s . o u t = F{ i } ( : , 2 ) ;
out{ i } = [ s . in , s . out ] ;
end
15 f o r i = 1: sizelinks
s . in = l i n k s { i }(: ,1) ;
s . out = l i n k s { i } ( : , 2 ) ;
o u t { s i z e e d g e s+i } = [ s . i n , s . o u t ] ;
end
20 W = e d g e s ( o u t ) ;
end
The graph class is quite similar to what we had before but we have added some
options.
• g.v returns the vertices object as before. The difference is that the vertices are
now six dimensional addresses.
• g.e returns the usual edge object. Here, the difference is that the edge pairs are
not pairs of six dimensional addresses.
• g.n returns the nodes data structure obtained from the nodes = V.n expres-
sion. This gives us essentially a structure where nodes(i) is a structure whose
number field is the global node number and whose address field is the node’s six
dimensional address.
• g.l returns the edges data structure obtained from the links = E.n; expres-
sion. This gives us essentially a structure like the one above: the number field is
the global node number and the address field is the node’s six dimensional address.
For example, if we had the code snippet
Then, here is what we would find for the various parts of the graph object OCOS.
The OCOS.v data is a matrix of addresses. Each column is a node address. In the
code above, we first construct an OCOS graph whose nodes are
0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
1 2 3 4 5 6 7
Note that our OCOS circuit now does not contain a node 8 as that is a thala-
mus neuron and we will model it separately later. Then, we use the methods
addlocationtonodes to relabel the node addresses and and addlocation
toedges to relabel the edge addresses with the fundamental building block address
[0; 0; 0; 0; 1; ·]. This gives
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
10 0 0 0 0 0 0 0
1 1 1 1 1 1 1
1 2 3 4 5 6 7
which reflects what we said before. The OCOS circuit is a basic building block
whose address is [0; 0; 0; 0; 1; ·]. This neuron is the third one in that circuit and so
its address is [0; 0; 0; 0; 1; 3]. Since we want to maintain a list of global nodes, the
next data structure maintains a table whose first column is the global node number
and whose second column is the address of that node.
6 number
address
}
7 0
0
0
0
1
12 3
Then we can ask what the global node number is as well as the six dimensional
address.
6 number
edge
}
lOCOS ( 3 )
11 ans =
scalar structure c o n t a i n i n g the fields :
number = 3
edge =
0 0
16 0 0
0 0
0 0
1 1
1 4
21 }
The OCOS.l command returns a data structure whose fields are the global edge
number and an edge. Recall an edge is between the in node and the out node. So,
here edge is a pair of six dimensional addresses with the first address begin the the
address of the in node the second one being the address of the out node. For example,
the information on the third edge is
The in and out parts of the edge correspond to the first and second column, respec-
tively.
19.1 Graph Class Two 429
With all the above said, we can now look at the constructor and see how we imple-
mented this.
15 e l s e i f ( strcmp ( f l d , ” n ” ) )
out = g . n ;
e l s e i f ( strcmp ( f l d , ” l ” ) )
out = g . l ;
else
20 e r r o r (” i n v a l i d property f o r graph o b j e c t ” ) ;
end
otherwise
e r r o r (” @ ve r t i c e s / s u b s r e f : invalid s u b s c r i p t type f o r graphs ”) ;
end
25 end
First, let’s look at how we add the mask or global address to a given graph.
We begin with the code to add a mask or location to each node and vertex of a graph.
This is now harder than you might think as the incidence matrix requires us to use
global node and edge numbers and all of our addresses are six dimensional vectors.
The basic idea is that once we create our graph with six dimensional addresses,
we want to create a data structure B which links the pairs (global node number,
node address) somehow. We will do this by creating a vector B. We come up with a
function φ which assigns to each vector address a and unique integer φ(a). We then
set the entry B(φ(a)) in the matrix to have the value of the global node number that is
linked to this address. So for example, say the global node of a = [0; 2; 3; 0; 1; 2])
is 8 and φ(a) = 46. Then we set B(46) = 8. The other rows of B are given values
of 0. So it would be nice if we kept the size of B small and had as few 0’s in it as
possible. Hence, B(φ(a)) is the global node number corresponding to the address a.
A standard way to do this in other programming languages is to create what is called
an associative array C which would allows us to use the vector address as a key
whose value is the corresponding node i; hence, we could write A([0; 2; 3; 0; 1; 2])
if that address is in the graph and this would return the global node for that address.
This is called a reverse lookup on the six dimensional address. Now MatLab/Octave
does not have associative containers so we cannot use the six dimensional address as
a key. Now once, we have created our function φ (called a hashing function), we can
calculate the associative array B and write a function to do the reverse lookups. Now
in our graph data structure for a graph g, nodes = g.n gives us a structure where
nodes(i).number gives the global node number i and nodes(i).address
is the corresponding vector address. The function link2global will take a six
dimensional address and find the global node number. It has the following code.
end
19.2 Class Methods Two 433
The most straightforward way to build the incidence matrix is then to do a loop as
follows.
% get sizes
[ row , s i z e V ] = s i z e ( n o d e s ) ;
16 [ row , s i z e E ] = s i z e ( e d g e s ) ;
K = z e r o s ( sizeV , s i z e E ) ;
f o r i = 1: sizeV
for j = 1: sizeE
21 % a s s o c i a t i v e a r r a y from V g i v e s c o r r e s p o n d e n c e
% g l o b a l node number −> a c t u a l h i e r a r c h i c a l a d d r e s s
a = n o d e s ( i ) . number ;
b = nodes ( i ) . a d d r e s s ;
% g e t g l o b a l e d g e number
26 c = e d g e s ( j ) . number ;
% g e t a d d r e s s p a i r f o r t h i s g l o b a l e d g e number
d = edges ( j ) ;
%
% t h i s i s e x p e n s i v e as in g e n e r a l
31 % i t i s a l i n e a r search
%
u = l i n k 2 g l o b a l ( nodes , d ( : , 1 ) ) ;
v = l i n k 2 g l o b a l ( nodes , d ( : , 2 ) ) ;
i f u == i
36 K( i , j ) = 1 ;
e l s e i f v == i
K( i , j ) = −1;
end
end
41 end
end
Let’s count this out. For a typical small brain model of 500 nodes and 3000 edges, a
link2global call takes about 500 or less iterations. We need two calls for each
edge. So we have about 1000 iterations for each edge. But we have 3000 edges, so
that puts us up to 3 million iterations to loop through the edges for each node. Since
there are 500 nodes, this leaves us with an iteration count of 1.5 billion. So when we
use this method on a small brain model, it takes a long time to return with the
incidence matrix!
A better way to to use the method Address2Global which loops through the
nodes of the graph and assigns to each node six dimensional address its unique global
434 19 Address Based Graphs
node number via our choice of hash function φ. One way to do this is to take the six
dimensional address, convert it to a string and then convert that string to an integer.
This creates a list of unique integers. This conversion is what is called a hashing
function. We then set up a large vector B as follows. We initialize B to all zeroes.
If a six dimensional address corresponded to the unique integer N and global node
number I, we would set B(N) = I. For example, if the address is [0; 2; 3; 2; 1; 2]—
this neuron is in sensory module 2, column 3, can 2 and is neuron 2 in the fundamental
OCOS building block. This address converts to the string ’023212’ which then
converts to some integer; the simplest conversion would be to set the integer to be
23212. However, we can do other conversions too. In this simplest conversion choice,
if the global node of this neuron was, say, 67, we would then set B(23212) = 67. This
uses storage to offset the search cost; in fact, the size of B becomes enormous and we
will find out we can actually run out of memory! But this method is straightforward
to code and it is instructive to see how we can refine our ideas and make them better
with reflection! The code for this hashing function is given below. As mentioned,
this is clearly inefficient as there are mostly zeros in this vector. The loop cost here
is the size of the node list—for a small brain model about 500.
The code above uses the utility function Address2String. This converts a six
dimensional address into a string. Now we have to be careful here. Here is an example:
suppose we have 3 nodes with addresses [0, 2, 32, 1, 1, 1], [0, 2, 3, 2, 11, 1] and
[0, 2, 3, 2, 11, 1]. Then all 3 addresses will give the same integer 0232111 which
19.2 Class Methods Two 435
is a hash collision. Hence, the code in this Address2String can generate hash
collisions.
end
We can fix this by finding the maximum number of nodes for each level. Our
addresses have the form [n5 , n4 , n3 , n2 , n1 , n0 ] which in Matlab is encoded as
[n6,n5,n4,n3,n2,n1]. We can easily find the maximum number of possi-
bilities in each address entry. We use this function.
Then, once we have these numbers, we add an appropriate offset to the hash
calculations. For example, if the n0 address had a maximum range of 2 − 2, we
would add 22 + 1 to all the entries in the n1 slot. If the n1 slot has a maximum range
of 9, we would add (22 + 1)(9 + 1) to all the n1 entries. This will ensure that each
address is unique. We then use these altered addresses to compute the hash value. We
will still use the original Address2String but we will now alter the addresses
we send in. However, using the string conversions generates very large addresses.
So let’s try it again. Instead of using the string conversions, we will use the address
maximums in each slot directly with the function FindGlobalAddress which
calculates the number of unique addresses we need more directly. This code is shown
below
436 19 Address Based Graphs
end
The code for the new incidence matrix calculation then has this form.
end
The first thing we do is get size information and set up a blank incidence matrix.
13 % get sizes
[ row , s i z e V ] = s i z e ( n o d e s ) ;
[ row , s i z e E ] = s i z e ( e d g e s ) ;
K = z e r o s ( sizeV , s i z e E ) ;
Then, we loop through the edges and convert all the six dimensional address pairs to
integers as we described. For a small brain model, this takes about 3000 iterations.
These integer pairs are stored in the structure out.
19.2 Class Methods Two 437
We then call the Address2Global function which will set up a link between the
node address and the global node number. This is stored in a matrix B; i.e. B(Ai ) = Vi
where Ai is the node address and Vi is the global node number. The code also uses
the address slot maximums.
end
This costs about 500 more iterations for a small brain model. We call this function
as follows:
438 19 Address Based Graphs
Finally, we set up the incidence matrix. This is always a long double loop which
cost 500 × 3000 iterations for a small brain model even without the reverse lookup
problem. The vector B contains our needed reverse lookup information.
end
% get sizes
[ row , s i z e V ] = s i z e ( n o d e s ) ;
[ row , s i z e E ] = s i z e ( e d g e s ) ;
20 K = z e r o s ( sizeV , s i z e E ) ;
%
% setup incidence matrix
%
d = edges ( 1 ) . edge ;
25 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
o u t {1} = [ u , v ] ;
f o r i =2: s i z e E
d = edges ( i ) . edge ;
19.2 Class Methods Two 439
30 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
out{ i } = [ u , v ] ;
end
35 [ Vertex , Address , B ] = A d d r e s s 2 G l o b a l ( g ) ;
f o r i = 1: sizeV
for j = 1: sizeE
% a s s o c i a t i v e a r r a y from V g i v e s c o r r e s p o n d e n c e
40 % g l o b a l node number −> a c t u a l h i e r a r c h i c a l a d d r e s s
d = out{ j } ;
u = out{ j } ( 1 , 1 ) ;
v = out{ j } ( 1 , 2 ) ;
i f B( 1 , u ) == i
45 K( i , j ) = 1 ;
e l s e i f B( 1 , v ) == i
K( i , j ) = −1;
end
end
50 end
end
Next, although it is part of the incidence code, we can explicitly find the
associative array B to see what it looks like.
10 1
1
1
1
2
15 16
row = 1
sizeV = 7
b i g g e s t = 15
We see the associative array B has 15 rows (the value of biggest). There are only
two slots in the address vectors that have entries: slot 5 and 6 with corresponding
maxima 1 and 7. The algorithm we have described for assigning a global number to
each address then finds the associative array B. Note we have
1 2 3 4 5 6 7
5 > Ad d r ess
Ad d r ess =
3 5 7 9 11 13 15
> B
10 B =
0 0 1 0 2 0 3 0 4 0 5 0 6 0 7
Once we know how to find the incidence matrix, we can find the Laplacian of the
graph easily.
At any neuron i in our graphs, the neurons which interact with it via a synaptic contact
are in the backward set for neuron i, B(i). Hence, the input to neuron i is the sum
Ej→i Y (j)
j∈B(i)
where Ej→i is value we assign the edge between j and i in our graph and Y (j) is the
output we assign to neuron j. We also know the neurons that neuron i sends its output
signal to are in its forward set F(i). As usual, both of these sets are readily found by
looking at the incidence matrix for the graph. We can do this in the code BFsets
which is a new graph method placed in the @graphs directory. This will be more
complicated code than the version we have for global node and link numbers. In it,
we find the backward and forward sets for each neuron in both the six dimensional
address form and the global node form. In a given row of the incidence matrix, the
positive 1’s tells us the edges corresponding to the forward links and the negative
1’s give us the backward edges. So we look at each row of the incidence matrix and
calculate the needed nodes for the backward and forward sets.
442 19 Address Based Graphs
% get sizes
[ row , s i z e E ] = s i z e ( e d g e s ) ;
%
25 % setup out matrix
%
d = edges ( 1 ) . edge ;
%u = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 1 ) ) ) ;
%v = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 2 ) ) ) ;
30 u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
o u t {1} = [ u , v ] ;
f o r i =2: s i z e E
d = edges ( i ) . edge ;
35 %u = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 1 ) ) ) ;
%v = str2num ( A d d r e s s 2 S t r i n g ( d ( : , 2 ) ) ) ;
u = F i n d G l o b a l A d d r e s s ( d ( : , 1 ) , gT ) ;
v = F i n d G l o b a l A d d r e s s ( d ( : , 2 ) , gT ) ;
out{ i } = [ u , v ] ;
40 end
[ Vertex , Address , B ] = A d d r e s s 2 G l o b a l ( g ) ;
[ Kgrows , K g c o l s ] = s i z e (Kg) ;
45
BackA = { } ;
ForwardA = { } ;
BackGlobal = {};
ForwardGlobal = {};
50 BackEdgeGlobal = { } ;
f o r i = 1 : Kgrows
BackA{ i } = [ ] ;
ForwardA{ i } = [ ] ;
BackGlobal{ i } = [ ] ;
55 ForwardGlobal { i } = [ ] ;
BackEdgeGlobal { i } = [ ] ;
ForwardEdgeGlobal { i } = [ ] ;
f o r j = 1 : Kgcols
d = edges ( j ) . edge ;
60 u = d(: ,1) ;
v = d(: ,2) ;
a = out{ j } ( 1 , 1 ) ;
b = out{ j } ( 1 , 2 ) ;
i f Kg( i , j ) == 1
19.3 Evaluation and Update Strategies in Graphs 443
65 ForwardA{ i } = [ ForwardA{ i } , v ] ;
F o r w a r d G l o b a l { i } = [ F o r w a r d G l o b a l { i } , B( 1 , b ) ] ;
ForwardEdgeGlobal { i } = [ ForwardEdgeGlobal { i } , j ] ;
e l s e i f Kg( i , j ) == −1
BackA{ i } = [ BackA{ i } , u ] ;
70 B a c k G l o b a l { i } = [ B a c k G l o b a l { i } , B( 1 , a ) ] ;
BackEdgeGlobal { i } = [ BackEdgeGlobal { i } , j ] ;
end
end
end
75 end
Let’s see how this works with a simple example. We again build an OCOS module
with 7 nodes and 7 edges.
Now the backward global information here is B{1} = {}, B{2} = {1}, B{3} = {1},
B{4} = {1}, B{5} = {2}, B{6} = {3, 1} and B{7} = {4}. This information is stored
in BackGlobal which looks like
444 19 Address Based Graphs
[1 ,7] = 4
}
and so on. The forward sets are similar and the other data structures give the same
information but list everything using the full vector addresses. Thus, BackA{2}
gives the same information as BackGlobal{2} but gives the addresses instead of
global node numbers.
19.3 Evaluation and Update Strategies in Graphs 445
The simplest evaluation strategy then is to let each neuron have an output value
determined by the usual simple sigmoid function with code in sigmoid.m. We use
this sigmoid to do a simple graph evaluation. We then evaluate the sigmoid functions
at each node using the current synaptic interaction values just as we did in the global
graph code.
% get size
10 s i z e V = l e n g t h (Y) ;
f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
15 % g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
20 f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
25 NodeVals ( i ) = s i g m o i d ( sum ) ;
end
A simple edge update function is then given by in the file below. This uses what is
called a Hebbian update to change the scalar weight associated with each edge. If
the value of the post neuron i is high and the value of the edge Ej→i is also high,
the value of this edge is increased by a multiplier such as 1.05 or 1.1. The code to
implement is the same as the global graph code and is not shown.
A simple example is shown below for the evaluation step for a brain model such
as we will build in Chap. 20. This discussion is a little out of place, but we think it is
worth the risk. The next chapter shows how we build a human brain model consisting
of many modules linked together and in subsequent chapters we also build models
of a cuttlefish and pigeon brain. Hence the function brain here is a generic place
446 19 Address Based Graphs
keeper for any such brain model we build. Of course, to make this really sink in,
you’ll have to read the next chapter also and go back and forth a bit to get it straight!
So to set up the evaluation, first, we set up a node value vector Y the size of the
nodes and a weight value vector W the size of edges of the brain model. Here we
will also use an offset and gain vector. For the evaluation, we still assume the neuron
or node value is given by the equation
Ej→i Y (j)
j∈B(i)
where we now posit that each edge Ej→i is simply a value called a weight, Wji and
the neuron evaluation uses a sigmoid calculation. If x is the input into a neuron and if
o is the offset of the neuron and g is its gain, we find the neuronal output is given by
x−o
Y = 0.5 1 + tanh .
g
So we will alter the evaluation code a bit to allow us to do this more interesting
sigmoid calculation.
% get size
s i z e V = l e n g t h (Y) ;
15 f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
20 lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = 0 . 0 ;
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
25 p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
X = ( sum − O( i ) ) / G( i ) ;
Y( i ) = s i g m o i d (X) ;
30 end
NodeVals = Y;
end
19.3 Evaluation and Update Strategies in Graphs 447
We need to set up the offset vector and gain vector to be the size of the graphs
vertices and initialize them in some fashion. We fill the weight vector randomly
with numbers from −1 to 1. Then we compute the needed backward set information
and use it to perform an evaluation step. Then we do a synaptic weight Hebbian
update. At this point, we are just showing the basic ideas of this sort of computation.
We still have to restrict certain edges to have only negative values as they corre-
spond to inhibitory synaptic contacts. Also, we have not yet implemented the cable
equation update step afforded us with the graph’s laplacian. That is to come. The
function HebbianUpdateSynapticValue is the same as the one we used in
GraphsGlobal because it is written using only global graph information which
has been extracted from the address based graph. Here is how we would evaluation
the values (i.e. neuron values) at each node in a brain model. We are jumping ahead
of course, as we have not yet discussed how to build such a brain model, but this
code snippet will give you the idea. We start by building our brain model.
Next, get the backward and forward information for this graph.
448 19 Address Based Graphs
Now, initialize the output vector Y, the offset and gain vectors, O and G and the
edge weight vector W. Here we initialize W to be randomly chosen in the range [−1, 1],
the offset O to be randomly in the range [−1, 1] and the Gain G to be randomly in
the range [0.45, 0.9].
We start with
0 0 0 0 0
We have done the first evaluation and now the node values have changed.
Now do the first Hebbian Update which updates the weight values on the edges.
Follow that by another evaluation to see what happened.
19.3 Evaluation and Update Strategies in Graphs 449
We can see how the new weight values have effected the nodal computations!
Now let’s consider how we might train a given graph to have certain target node
values for specified inputs. We will illustrate this with a very simple OCOS graph
and one input at global node 1 of value 0.3. The specified output is in nodes 5, 6
and 7 of value 0, 1 and 0. We will continue to use simple sigmoid node evaluation
engines for now. We encode the input and target information as follows. First the
inputs:
0.00000
0.00000
6 0.00000
0.00000
0.00000
0.00000
0.30000
We usually call this sort of information Training Data. We can check how we did
with a few simple tests.
450 19 Address Based Graphs
5 0
DesiredOutput {1}(1)
7 ans = 5
DesiredOutput {1}(2)
ans = 0
DesiredOutput {2}(1)
ans = 6
12 DesiredOutput {2}(2)
ans = 1
DesiredOutput {3}(1)
ans = 7
DesiredOutput {3}(2)
17 ans = 0
We then find the incidence matrix, the backward and forward data and initialize all
the graph values.
Listing 19.68: Incidence Matrix, Backward and Forward Data and Initialization
KOCOS = i n c i d e n c e (OCOS) ;
[ BA, BGlobal , FA, FGlobal , BEGlobal , FEGlobal ] = B F s e t s (OCOS,KOCOS) ;
Y = zeros (1 ,7) ;
4 O = −1+2∗rand ( 1 , 7 ) ;
G = 0 . 5 ∗ ( 0 . 9 + rand ( 1 , 7 ) ) ;
W = rand ( 1 , 7 )
W=
0.527088 0.959165 0.048946 0.374755 0.322442 0.698353
0.184643
9 % so w e i g h t s a r e a l l p o s i t i v e . Now we s e t t h e i n h i b i t o r y e d g e s .
W( 4 ) = −W( 4 ) ;
W( 5 ) = −W( 5 ) ;
W( 6 ) = −W( 6 ) ;
W=
14 0.45076 0.81485 0.45086 − 0.79840 − 0.10357 − 0.74836
0.59966
19.4 Adding Inhibition 451
We are setting edges 4, 5 and 6 to be inhibitory as we know this is what the neuro-
biology tells us. We do this by manually changes these edge weights to be negative
once they are randomly set. We set W to be values randomly chosen between 0 and
1 and then set the inhibitory values. Now we have to do a preliminary evaluation.
This uses a new function evalwithinput.
The new function is like the old evaluation except it use input data to seed each
nodal calculation. Here is the updated function: the only thing that changed is that
each sum is initialized to an input value rather than zero.
% get size
s i z e V = l e n g t h (Y) ;
15
f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
20 BEF = BE{ i } ;
lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
sum = I n p u t ( i ) ;
f o r j = 1 : lenBEF
25 l i n k = BEF( j ) ;
p r e = BF( j ) ;
sum = sum + W( l i n k ) ∗Y( p r e ) ;
end
X = ( sum − O( i ) ) / G( i ) ;
30 Y( i ) = s i g m o i d (X) ;
end
NodeVals = Y;
35 end
Now we need to alter the edge values of the graph so that we can map our input
data to our desired target data. We do this with the modified Hebbian training
452 19 Address Based Graphs
We need to set some tolerances before we do the Hebbian updates: scale is the
usual multiplier we see in a Hebbian update, Tol is the tolerance that determines
if an edge weight should be updated because of a correlation with the post nodal
value and TarTol is the fraction of the error update we use. The new function
is listed below. This function is a bit complicated as we are now handling updates
differently depending on whether the neuron is inhibitory or excitatory and whether
the neuron is an input or a target. We start by taking the given input and target data and
use it to set up a data structure which hold this information. The ValueAndType
structure holds three things: the neuron value value; whether it is inhibitory or
not inhibitory = 1 or inhibitory = 0; and whether it is a target or not,
target = 1 or target = 0. We loop through the neurons and initialize the
ValueAndType data structure and then store it in a cell array Neuron for access
later.
If we have a target node i, we have the usual evaluation xi = j∈B(i) Ej→i Y (j) and
the target value is given by Y (i) = 0.5 1 + tanh x(i)−o(i)
g(i)
for the neuron’s offset
o(i) and gain g(i). The neuron node calculation function is thus given by
x−o
σ (x, o, g) = 1 + tanh .
g
where y = σ (x, o, g). Now if the target value is 1 or 0, this will be unachievable
as the inverse will return ∞ and −∞, respectively. So in this case, we replace the
inputs by a suitable 1 − and for ease of computation. For a given target, we have
σ −1 (Y (i)) = x(i)−o(i)
g(i)
. This leads to x(i) = o(i) + g(i)σ −1 (Y (i)) and so we have
Ej→i Y (j) = o(i) + g(i)σ −1 (Y (i)).
j∈B(i)
We then apply Hebbian ideas and choose which of the summands in this sum to
update. For example, if the Ej→i Y (i) is sufficiently large, we note
But Y (j) could be zero and we probably don’t want to use the full update. So we use
The equation above shows how we handle the target updates with slight adjustments
for the inhibitory case.
14 i f i n h i b i t o r y == 1
i f hebb < −Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
end
end
19 % Y post is excitatory
i f i n h i b i t o r y == 0
i f hebb > Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
end
24 end
Wts ( l i n k ) = w e i g h t ;
end% b a c k w a r d s l i n k s l o o p f o r t a r g e t s
end % i s i s a t a r g e t
% get size
s i z e V = l e n g t h (Y) ;
23 s i z e E = l e n g t h (W) ;
s i z e T = l e n g t h ( DesiredOutput ) ;
sizeI = length ( Inhibitory ) ;
ValueAndType = s t r u c t ( ) ;
Neuron = { } ;
28 f o r i =1: s i z e V
ValueAndType . v a l u e = Y( i ) ;
ValueAndype . i n h i b i t o r y = 0 ;
ValueAndType . t a r g e t = 0 ;
f o r k =1: s i z e T
33 i f i == I n h i b i t o r y ( k ) ;
ValueAndype . i n h i b i t o r y = 1 ;
end
end
f o r k =1: s i z e I
38 i f i == D e s i r e d O u t p u t {k } ( 1 ) ;
ValueAndype . t a r g e t = 1 ;
end
end
v = ValueAndType . v a l u e ;
456 19 Address Based Graphs
43 i n h = ValueAndype . i n h i b i t o r y ;
t a r = ValueAndType . t a r g e t ;
Neuron{ i } = [ v , inh , t a r ] ;
end
48 f o r i = 1: sizeV
% g e t b a c k w a r d node i n f o r m a t i o n f o r neuron i
BF = B{ i } ;
% g e t backward edge i n f o r m a t i o n f o r neuron i
BEF = BE{ i } ;
53 lenB = l e n g t h (BF) ;
lenBEF = l e n g t h (BEF) ;
v a l u e = Neuron{ i } ( 1 ) ;
i n h i b i t o r y = Neuron{ i } ( 2 ) ;
t a r g e t = Neuron{ i } ( 3 ) ;
58 % s e e i f we h a v e a t a r g e t n e u r o n
TargetMatch = 0 ;
IsTarget = 0;
i f i ==t a r g e t
IsTarget = 1;
63 end
%n o t a t a r g e t n e u r o n
i f I s T a r g e t == 0
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
68 p r e = BF( j ) ;
post = i ;
hebb = Y( p o s t ) ∗W( l i n k ) ;
w e i g h t = W( l i n k ) ;
i n h i b i t o r y = Neuron{ p o s t } ( 2 ) ;
73 % Y post is inhibitory
i f i n h i b i t o r y == 1
i f hebb < −Tol
w e i g h t = − s c a l e ∗ a b s (W( l i n k ) )
end
78 end
% Y post is excitatory
i f i n h i b i t o r y == 0
i f hebb > Tol
w e i g h t = s c a l e ∗ a b s (W( l i n k ) ) ;
83 end
end
Wts ( l i n k ) = w e i g h t ;
end% b a c k w a r d l i n k s l o o p f o r non t a r g e t s
end
88 % a t a r g e t neuron
i f I s T a r g e t ==1
t a r g e t i n v e r s e = O( i ) + G( i ) ∗ S i g I n v ( v a l u e , O( i ) ,G( i ) )
f o r j = 1 : lenBEF
l i n k = BEF( j ) ;
93 p r e = BF( j ) ;
post = i ;
hebb = Y( p o s t ) ∗W( l i n k ) ;
w e i g h t = W( l i n k ) ;
i n h i b i t o r y = Neuron{ p o s t } ( 2 ) ;
98 % Y post is inhibitory
i f i n h i b i t o r y == 1
i f hebb < −Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
end
19.4 Adding Inhibition 457
103 end
% Y post is excitatory
i f i n h i b i t o r y == 0
i f hebb > Tol
w e i g h t = TargTol ∗ t a r g e t i n v e r s e / (Y( p r e ) + . 1 ) ;
108 end
end
Wts ( l i n k ) = w e i g h t ;
end% b a c k w a r d s l i n k s l o o p f o r t a r g e t s
end % i s i s a t a r g e t
113 end% n e u r o n l o o p
end
Now if we wish to do these updates for multiple steps, we need a training function.
A simple one is shown in HebbianTrainingWithError whose code is given
below.
19.4 Adding Inhibition 459
Here is how it works in a practice session. We will assume all the initialization from
the previous sample session and just show the training. Here we use 151 steps–setting
the iteration count to T = 152 actually does 151 steps.
and we are close to achieving our targets: remember we do not use the target values
of 0 and 1. However, we are making no effort at controlling the sizes of the edge
weights as you can see from the value of W (7)!
Chapter 20
Building Brain Models
We are interested in modeling the brains and other neural systems of a variety of
creatures. This text has given us the tools to begin to do this and in this chapter,
we will show how to build a human brain model using address based graphs in a
modular fashion. Some of our modules will have a lot of structure and some will
be just sketches. Our intent is to furnish you with a guide to how you could use
these ideas to build a model of any neural system of interest. The general process
is to look at the literature of the animal in question and try to find the neural circuit
diagrams from which a graph model can be built. This is not an easy task and with
the help of students, we have built preliminary models of honeybee, spider, pigeon
and squid/cuttlefish brains. It is a very interesting journey and in the next volume we
will explore it much further.
Now to build a small brain from component modules, we need code to build the
neural modules and link them together. We do this in several steps. The first one is
to assemble a cortex module. We have already discussed this quite a bit. The plan is
to build a cortical can out of OCOS, FFP and Two/Three circuits, assemble three or
more cans into a cortical column and then assemble multiple columns into a cortical
sheet. The code to do this is very modular.
A cortex module is built from cans assembled into columns which are then organized
into one or two dimensional sheets. So the process of building a cortex module is
pretty intense! However, we can use the resulting module as an isocortex building
block as we know all cortex prior to environmental imprinting is the same. Hence,
we can build a cortex module for visual cortex, auditory cortex etc and simply label
their respective global vector addresses appropriately.
We will build a cortical can using the OCOS, FFP and Two/Three building blocks.
After we build the OCOS, FFP and Two/Three blocks we set their location addresses
as we discussed. OCOS is [0; 0; 0; 0; 1; ·], FFP is [0; 0; 0; 0; 2; ·] and Two/Three is
[0; 0; 0; 0; 3; ·]. We use the function buildcan.
end
First,we build the OCOS, FFP and Two/Three circuit blocks, seen in Figs. 20.1a,b
and 20.2a, and set their location masks as discussed above. We have simplified the
FFP so that it consists of only 2 nodes: we just specify there is node 1 in the can
above which connects to node 1 of the OCOS module. Again, note we are not using
the thalamus node in the OCOS circuit.
Listing 20.2: Build the OCOS, FFP and Two/Three circuit blocks
vOCOS = { [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 3 ] , . . .
2 [0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6] ,...
[0;0;0;0;0;7]};
eOCOS = { [ [ 0 ; 0 ; 0 ; 0 ; 0 ; 1 ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; 2 ] ] , . . .
[[0;0;0;0;0;1] ,[0;0;0;0;0;3]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
7 [[0;0;0;0;0;2] ,[0;0;0;0;0;5]] ,...
[[0;0;0;0;0;3] ,[0;0;0;0;0;6]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;7]] ,...
[[0;0;0;0;0;1] ,[0;0;0;0;0;6]]};
VOCOS = v e r t i c e s (vOCOS) ;
12 EOCOS = e d g e s (eOCOS) ;
OCOS = g r a p h s (VOCOS, EOCOS) ;
locationOCOS = [ 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
OCOS=a d d l o c a t i o n t o n o d e s (OCOS, locationOCOS ) ;
OCOS=a d d l o c a t i o n t o e d g e s (OCOS, locationOCOS ) ;
17
vFFP = {[0;0;0;0;0;1] ,[0;0;0;0;0;2]};
eFFP = {[[0;0;0;0;0;1] ,[0;0;0;0;0;2]]};
VFFP = v e r t i c e s ( vFFP ) ;
EFFP = e d g e s ( eFFP ) ;
20.1 Build A Cortex Module 463
27 vTwoThree = {
[0;0;0;0;0;1] ,[0;0;0;0;0;2] ,[0;0;0;0;0;3] ,...
[0;0;0;0;0;4] ,[0;0;0;0;0;5] ,[0;0;0;0;0;6]};
eTwoThree = {
[[0;0;0;0;0;5] ,[0;0;0;0;0;1]] ,[[0;0;0;0;0;6] ,[0;0;0;0;0;3]] ,...
32 [[0;0;0;0;0;1] ,[0;0;0;0;0;2]] ,[[0;0;0;0;0;1] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;3] ,[0;0;0;0;0;2]] ,[[0;0;0;0;0;3] ,[0;0;0;0;0;4]] ,...
[[0;0;0;0;0;4] ,[0;0;0;0;0;2]]};
VTwoThree = v e r t i c e s ( vTwoThree ) ;
ETwoThree = e d g e s ( eTwoThree ) ;
37 TwoThree = g r a p h s ( VTwoThree , ETwoThree ) ;
locationTwoThree = [ 0 ; 0 ; 0 ; 0 ; 3 ; 0 ] ;
TwoThree=a d d l o c a t i o n t o n o d e s ( TwoThree , l o c a t i o n T w o T h r e e ) ;
TwoThree=a d d l o c a t i o n t o e d g e s ( TwoThree , l o c a t i o n T w o T h r e e ) ;
Then, we add all the individual nodes together into the graph CanOne.
Now that CanOne has all the nodes, we add the edges. CanOne already has the
OCOS edges, so we just need to add the FFP and Two/Three edges.
We can visualize the can easily now. We build the can, its incidence matrix and then
generate the needed dot file. We then can see the assembled components in a can in
Fig. 20.2b.
Once we can build a can, we can glue cans together to build columns. We will
default to a three can size for a column. The function is
end
First, we build three cans and set there location masks: CanOne is [0; 0; 0; 1; ·; ·],
CanTwo is [0; 0; 0; 2; ·; ·] and CanThree is [0; 0; 0; 3; ·; ·].
20.1 Build A Cortex Module 465
Next, we add the cans to make a column. This uses the add a node at a time approach
which is inefficient, but makes it clearer what we are doing.
Now that Column has all the nodes of the three cans, we add in all the can edges.
Since Column contains all of CanOne, we only have to add the edges of CanTwo
and CanThree. This also uses the inefficient add an edge at a time approach.
466 20 Building Brain Models
We can see a typical column in Fig. 20.3. We first generate the dot file and then the
associated graphic as follows:
To build the model of cortex, we then glue together as many cortex columns as we
want. We use two choices: a type of ’single’ which means just one column is used
and a type of ’sheet’ which means we want a cortex model that is a rectangular
sheet of r rows and c columns. We do this with a switch statement. The function is
then
The ’single’ case is easy. We build a single column and set its location to
[0; 0; 1; ·; ·; ·].
We then build the cortex model by gluing the columns together. We start by adding
nodes.
Now add the edges. The cortex module already has the edges of the first column, so
we only add the edges from column two on. We use the more efficient add a list of
edges here.
Then, we add the inter column connections. We do this by building a list of edges to
add called links{}. The code shows the old single edge command commented
out above the addition to the list of edges.
470 20 Building Brain Models
We can graph a typical 2 × 2 cortex by generating its dot file. We can see this in
Fig. 20.4. We build the cortex, its incidence matrix and generate the dot file with
these lines:
Listing 20.19: The cortex dot file and the graphical image
1 C or t e x = b u i l d c o r t e x ( 2 , 2 , );
KCortex = i n c i d e n c e ( Co r te x ) ;
incToDot ( KCortex , 6 , 6 , 1 , );
In this sample code, we build a thalamus module by gluing together reversed OCOS
blocks. Ours will have just two, but the pattern is easily repeated. We can build much
better models of thalamus by following the ideas in (Sherman and Guillery 2006)
and their modification in (Sherman and Guillery 2013). But that will be for another
time. Right now we are building a simple placeholder for the thalamus functions in
the full brain model. Since these networks are reversed OCOS’s (ROCOS’s), they
form a new fundamental block. The function is as follows:
end
We then build the needed reverse OCOS blocks and assemble into one module. We
construct individual names for the reverse OCOS graph objects and set their location
masks to be [0; 0; 0; 0; 1, ·] to [0; 0; 0; 0; 7, ·]. The case of thalamusSize == 1
is easy. But if there are more than one, we have to assemble more carefully.
In the more than one case, we do this. We create names for each thalamus module.
Then we create thalamus objects and set their addresses.
Listing 20.24: Building with more than one reversed OCOS blocks
b a s e = ’ ROCOS ’ ;
2 f o r i =1: t h a l a m u s S i z e
% c r e a t e name s t r i n g
name{ i } = [ base , num2str ( i ) ] ;
name{ i } = g r a p h s (VROCOS,EROCOS) ;
location = [ 0 ; 0 ; 0 ; 0 ; i ; 0 ] ;
7 name{ i } = a d d l o c a t i o n t o n o d e s ( name{ i } , l o c a t i o n ) ;
name{ i } = a d d l o c a t i o n t o e d g e s ( name{ i } , l o c a t i o n ) ;
end
Then add the edges. Note our simple thalamus model that consists of multiple reverse
OCOS
At this point, we don’t have intermodule connections between the different ROCOS
modules so it truly is a simple model! We then build a thalamus graph as usual and
construct its dot file to visualize it.
The midbrain module controls how neurotransmitters are used in our brain model.
Again, this model will be quite simple just to illustrate the points. You can easily
imagine many ways to extend and that will be necessary to build useful models
for various purposes. We build a simple midbrain model by assembling NumDop
dopamine, NumSer serotonin and NumNor norepinephrine neurons into two neuron
sets. The function template is
end
First, we build the neurotransmitter node and edge lists. Each neurotransmitter is
modeled as two nodes with a simple connection between them. First, we build nodes
and edges for each neurotransmitter.
7 f o r i = 1 :N
vDopNT{ i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] ;
end
f o r i = 1 :N
vDopNT{N+i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ;
12 end
eDopNT = { } ;
f o r i = 1 :N
eDopNT{ i } = [ [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ] ;
end
17
% Build Serotonin neurons
N = NumSer ;
vSerNT = { } ;
f o r i = 1 :N
22 vSerNT{ i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] ;
end
f o r i = 1 :N
vSerNT{N+i } = [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ;
end
27 eSerNT = { } ;
f o r i = 1 :N
eSerNT{ i } = [ [ 0 ; 0 ; 0 ; 0 ; 0 ; i ] , [ 0 ; 0 ; 0 ; 0 ; 0 ; N+i ] ] ;
end
Then, we build vertices and edges objects and then their corresponding neurotrans-
mitter objects.
VSerNT = v e r t i c e s ( vSerNT ) ;
ESerNT = e d g e s ( eSerNT ) ;
S e r o t o n i n = g r a p h s ( VSerNT , ESerNT ) ;
11 location = [ 0 ; 0 ; 0 ; 0 ; 2 ; 0 ] ;
Serotonin = addlocationtonodes ( Serotonin , l o c a t i o n ) ;
Serotonin = addlocationtoedges ( Serotonin , l o c a t i o n ) ;
476 20 Building Brain Models
VNorNT = v e r t i c e s ( vNorNT ) ;
16 ENorNT = e d g e s ( eNorNT ) ;
N o r e p i n e p h r i n e = g r a p h s (VNorNT, ENorNT) ;
location = [ 0 ; 0 ; 0 ; 0 ; 3 ; 0 ] ;
Norepinephrine = ad d l o ca t i o n to no d e s ( Norepinephrine , l o c a t i o n ) ;
Norepinephrine = a d d l o c a t i o n t o e d g e s ( Norepinephrine , l o c a t i o n ) ;
Then, we finish by gluing the nodes together into a MidBrain object and add the
edges.
nodes = Norepinephrine . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
10 for i = 1: nodesize
MidBrain = addnode ( MidBrain , n o d e s ( : , i ) ) ;
end
l i n k s = Serotonin . e ;
15 d i s p ( ’ add serotonin edges ’ ) ;
MidBrain = addedgev ( MidBrain , l i n k s ) ;
l i n k s = Norepinephrine . e ;
d i s p ( ’ add n o r e p i n e p h r i n e edges ’ ) ;
20 MidBrain = addedgev ( MidBrain , l i n k s ) ;
Listing 20.32: Generate the midbrain and its the dot file
MidBrain = b u i l d m i d b r a i n ( 2 , 2 , 2 ) ;
KMidBrain = i n c i d e n c e ( MidBrain ) ;
incToDot ( KMidBrain , 6 , 6 , 1 . 0 , ’ MidBrain . dot ’ ) ;
We then plot the image and we can see the midbrain object in Fig. 20.6.
Our simple brain model will consist of two sensory modules, an associative cortex,
a motor cortex, a thalamus model, a midbrain model and a cerebellum model. We
are not modeling memory or any motor output functions. We can add these later but
this will be a nice simple model with a fair bit of structure. Note the model is full
of feedback and input and output nodes are scattered all through the model. So all
of our hard work at trying to understand how to build input to output maps using
derivative based tools and Hebbian techniques as well as the conversion of a feedback
graph into a lagged feedforward are very relevant. Later, we will briefly introduce
another training technique that we think will be useful in efficiently building models
of cognitive function and dysfunction. The generic model then looks like what we
show in Fig. 20.7. We are not showing input/output models here.
The template we use to build the brain model is given below. It will return how
many nodes and edges are in the model as well as the graph that represents our model
of this brain.
where in the call to buildbrain we can choose how many of each neurotransmitter
we want to use and we return a vector of size for neurons and edges so that we can
build a better visualization of this graph. There are seven modules here and each has its
own number of neurons and edges. So the global neuron and edge numbering scheme
Cerebellum
478 20 Building Brain Models
can be broken up into pieces relevant to each module. The vectors NodeSizes and
EdgeSizes contain this information. In the original incToDot code, we just drew
a graph simply using the global node and edge numbers. We will now automate the
construction of the dot file differently. We will call each module its own cluster in the
digraph and allow for us to set individual colors for each module and both internal
links and intermodule links. Now let’s discuss how we build our brain model. This
will still not have an input or output section. First, we initialize the size counters.
We then build the SensoryOne cortical module and set its mask to [0; 1; ·; ·; ·; ·].
This allows us to set the node size and edge of the first module in the counters.
N1 = l e n g t h ( SensoryOne . v ) ;
E1 = l e n g t h ( SensoryOne . e ) ;
N o d e S i z e s ( 1 ) = N1 ;
10 E d g e S i z e s ( 1 ) = E1 ;
d i s p ( [ ’ Neurons 1 to ’ , num2str (N1) , ’ are Sensory One ’ ] ) ;
d i s p ( [ ’ Links 1 to ’ , num2str ( E1 ) , ’ are Sensory One ’ ] ) ;
We then build the SensoryTwo cortical module, set its mask to [0; 2; ·; ·; ·; ·] and
set counters.
N2 = l e n g t h ( SensoryTwo . v ) ;
E2 = l e n g t h ( SensoryTwo . e ) ;
N o d e S i z e s ( 2 ) = N2 ;
10 E d g e S i z e s ( 2 ) = E2 ;
I = N1+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2) , ’ are Sensory Two ’
]) ;
d i s p ( [ ’ Links ’ , num2str ( E1+1) , ’ to ’ , num2str ( E1+E2 ) , ’ are Sensory Two ’
]) ;
We will model the associative cortex as a 2 × 2 sheet and build the object
AssociativeCortex with mask address [0; 3; ·; ·; ·; ·].
20.4 Building the Brain Model 479
N3 = l e n g t h ( A s s o c i a t i v e C o r t e x . v ) ;
E3 = l e n g t h ( A s s o c i a t i v e C o r t e x . e ) ;
N o d e S i z e s ( 3 ) = N3 ;
10 E d g e S i z e s ( 3 ) = E3 ;
I = N1+N2+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3) , ’
are Associative Cortex ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+1) , ’ to ’ , num2str ( E1+E2+E3 ) , ’
15 are Associative Cortex ’ ] ) ;
We choose to model the motor cortex as a single cortical column and set its mask to
[0; 4; ·; ·; ·; ·].
N4 = l e n g t h ( MotorCortex . v ) ;
E4 = l e n g t h ( MotorCortex . e ) ;
N o d e S i z e s ( 4 ) = N4 ;
10 E d g e S i z e s ( 4 ) = E4 ;
I = N1+N2+N3+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4) , ’
are Motor Cortex ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+1) , ’ to ’ , num2str ( E1+E2+E3+E4 ) , ’
15 are Associati v e Cortex ’ ] ) ;
We then build the thalamus model, Thalamus with mask [0; 5; ·; ·; ·; ·].
N5 = l e n g t h ( Thalamus . v ) ;
E5 = l e n g t h ( Thalamus . e ) ;
N o d e S i z e s ( 5 ) = N5 ;
10 E d g e S i z e s ( 5 ) = E5 ;
I = N1+N2+N3+N4+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4+N5) , ’
are Thalamus ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+E4+1) , ’ to ’ , num2str ( E1+E2+E3+E4+E5 ) ,
’
15 are Thalamus ’ ] ) ;
Next, we model the midbrain as we discussed with 25 each of three different neuro-
transmitters. We set the mask now to [0; 6; ·; ·; ·; ·].
480 20 Building Brain Models
N6 = l e n g t h ( MidBrain . v ) ;
E6 = l e n g t h ( MidBrain . e ) ;
N o d e S i z e s ( 6 ) = N6 ;
10 E d g e S i z e s ( 6 ) = E6 ;
I = N1+N2+N3+N4+N5+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4+N5+N6) , ’
are MidBrain ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+E4+E5+1) , ’ to ’ , num2str ( E1+E2+E3+E4+
E5+E6 ) , ’
15 are MidBrain ’ ] ) ;
Finally, we model the cerebellum as a single cortical column for now and set its mask
to [0; 7; ·; ·; ·; ·].
N7 = l e n g t h ( C e r e b e l l u m . v ) ;
E7 = l e n g t h ( C e r e b e l l u m . e ) ;
N o d e S i z e s ( 7 ) = N7 ;
10 E d g e S i z e s ( 7 ) = E7 ;
I = N1+N2+N3+N4+N5+N6+1;
d i s p ( [ ’ Neurons ’ , num2str ( I ) , ’ to ’ , num2str (N1+N2+N3+N4+N5+N6+N7) , ’
are Cerebellu m ’ ] ) ;
d i s p ( [ ’ Links ’ , num2str ( E1+E2+E3+E4+E5+E6+1) , ’ to ’ , num2str ( E1+E2+E3+
E4+E5+E6+E7 ) , ’
15 are Cerebellu m ’ ] ) ;
We can then build a simple brain model object, Brain. We begin by gluing together
all the module nodes.
10 %
% Brain St ep 2
% add t h e n o d e s o f A s s o c i a t i v e t o B r a i n
%
nodes = A s s o c i a t i v e C o r t e x . v ;
15 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
%
20 % Brain St ep 3
% add t h e n o d e s o f M o t o r C o r t e x t o B r a i n
%
n o d e s = MotorCortex . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
25 for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
% Brain St ep 4
% add t h e n o d e s o f Thalamus t o B r a i n
30 %
n o d e s = Thalamus . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
35 end
% Brain St ep 5
% add t h e n o d e s o f MidBrain t o B r a i n
%
n o d e s = MidBrain . v ;
40 [ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
% Brain St ep 6
45 % add t h e n o d e s o f C e r e b e l l u m t o B r a i n
%
nodes = Cerebellum . v ;
[ n , n o d e s i z e ] = s i z e ( nodes ) ;
for i = 1: nodesize
50 B r a i n = addnode ( Brain , n o d e s ( : , i ) ) ;
end
Then we add in the edges. We already have the edges from Sensory Cortex One.
19 end
%
% Now add a l l t h e e d g e s from M o t o r C o r t e x
%
l i n k s = MotorCortex . e ;
24 [n , edgesize ] = size ( links ) ;
for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end
%
29 % Now add a l l t h e e d g e s from Thalamus
%
l i n k s = Thalamus . e ;
[n , edgesize ] = size ( links ) ;
for i = 1: edgesize
34 B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end
%
% Now add a l l t h e e d g e s from M i d b r a i n
%
39 l i n k s = MidBrain . e ;
[n , edgesize ] = size ( links ) ;
for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end
44 %
% Now add a l l t h e e d g e s from C e r e b e l l u m
%
l i n k s = Cerebellum . e ;
[n , edgesize ] = size ( links ) ;
49 for i = 1: edgesize
B r a i n = addedge ( Brain , l i n k s { i } ( : , 1 ) , l i n k s { i }(: ,2) ) ;
end
The next part is harder to write down as we have to carefully add all the edges between
neurons in various components.
We are going to setup the connections now for all the modules as shown in
Fig. 20.7. There are lot of them and in general, this is pretty time consuming and
intellectually demanding to setup. In the text below, we go through all the steps. We
can show you an intermediate step in Fig. 20.8.
This figure only shows the E S I A and E S I I A connections which we setup below.
We are connecting the FFP circuits in all three cans of each sensory cortex to the
Two/Three circuits in the associative cortex model. Our associative cortex consists
of four columns each having three cans. We connect the sensory cortex to neuron
5 of the Two/Three circuit in each column and each can in the associative cortex.
So for example, we connect [0; 1; 1; 3; 2; 1] to [0; 3; i; j; 3; 5] for i running from
1 to 4 (these are the associative cortex column indices) and for j running from 1 to
3 (these are the can indices). This gives 4 × 3 such connections. Now the address
[0; 1; 1; 3; 2; 1] is for sensory cortex one, column one (there is only one column
in each sensory cortex) and node 1 in the FFP circuit (that is the number 2 in the
address’s fifth column and the number 1 in the sixth column). We have to do that
for the other two column in each sensory cortex. So we have a total of 3 × 12 such
20.4 Building the Brain Model 483
connections for each sensory cortex and two sensory cortexes giving a total of 72
connections to code. It is hard to say how to do this. We have experimented with
graphic user interfaces which allow us to connect nodes by clicking on the pre node
and the post node to establish the link, but if you look at Fig. 20.8, you can see the
complexity of the drawing starts making this very hard to do. We still find it easier
to code this sort of stuff, but then our minds might be different from yours! So feel
free to experiment yourself and find a convenient way to set these links up you can
live with!
Now, let’s get started with all the setup. We start with the simplest interconnections.
First, we setup 9 links from sensory cortex one to associative cortex.
19 l i n k s {5} = [ [ 0 ; 1 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {6} = [ [ 0 ; 1 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
% n e u r o n 1 o f f f p can 1 i n s e n s o r y one t o r i g h t neurons
% o f e a c h 2/3 c i r c u i t i n a s s o c i a t i v e c o r t e x
% can 1 , can 2 , can 3
24 l i n k s {7} = [ [ 0 ; 1 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 1 ; 3 ; 5 ] ] ;
l i n k s {8} = [ [ 0 ; 1 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
l i n k s {9} = [ [ 0 ; 1 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
Second, we encode in the 9 links from sensory cortex two to associative cortex.
We then add 10 links from thalamus to sensory cortex one and two.
20.4 Building the Brain Model 485
Next are 10 more links from thalamus to associative cortex and motor cortex.
%
% now add c o n n e c t i o n s from c e r e b e l l u m t o motor c o r t e x
4 %
% add c e r e b e l l u m t o motor c o r t e x two / t h r e e can 3
l i n k s {51} = [ [ 0 ; 7 ; 1 ; 3 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
l i n k s {52} = [ [ 0 ; 7 ; 1 ; 2 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
l i n k s {53} = [ [ 0 ; 7 ; 1 ; 1 ; 2 ; 1 ] , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
486 20 Building Brain Models
Next, we add in the dopamine links. This is done is a loop and for ease of under-
standing, we set up all the dopamine links in the link list doplinks. First, we set
neurotransmitter sizes.
27 d o p l i n k s {N+14} = [ In , [ 0 ; 3 ; 1 ; 1 ; 3 ; 6 ] ] ;
d o p l i n k s {N+15} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 5 ] ] ;
d o p l i n k s {N+16} = [ In , [ 0 ; 3 ; 1 ; 2 ; 3 ; 6 ] ] ;
d o p l i n k s {N+17} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 5 ] ] ;
d o p l i n k s {N+18} = [ In , [ 0 ; 3 ; 1 ; 3 ; 3 ; 6 ] ] ;
32 % motor c o r t e x
d o p l i n k s {N+19} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 5 ] ] ;
d o p l i n k s {N+20} = [ In , [ 0 ; 4 ; 1 ; 1 ; 3 ; 6 ] ] ;
d o p l i n k s {N+21} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 5 ] ] ;
d o p l i n k s {N+22} = [ In , [ 0 ; 4 ; 1 ; 2 ; 3 ; 6 ] ] ;
37 d o p l i n k s {N+23} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
d o p l i n k s {N+24} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 6 ] ] ;
% thalamus
for j = 1:7
u = ( j −1) ∗ 3 ;
42 Out1 = [ 0 ; 5 ; 0 ; 0 ; j ; 1 ] ;
Out2 = [ 0 ; 5 ; 0 ; 0 ; j ; 2 ] ;
Out3 = [ 0 ; 5 ; 0 ; 0 ; j ; 3 ] ;
d o p l i n k s {N+24+u+1} = [ In , Out1 ] ;
d o p l i n k s {N+24+u+2} = [ In , Out2 ] ;
47 d o p l i n k s {N+24+u+3} = [ In , Out3 ] ;
end
end
B r a i n = addedgev ( Brain , d o p l i n k s ) ;
[ a , b ] = size ( doplinks ) ;
30 s e r l i n k s {N+23} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 5 ] ] ;
s e r l i n k s {N+24} = [ In , [ 0 ; 4 ; 1 ; 3 ; 3 ; 6 ] ] ;
% thalamus
for j = 1:7
u = ( j −1) ∗ 3 ;
35 Out1 = [ 0 ; 5 ; 0 ; 0 ; j ; 1 ] ;
Out2 = [ 0 ; 5 ; 0 ; 0 ; j ; 2 ] ;
Out3 = [ 0 ; 5 ; 0 ; 0 ; j ; 3 ] ;
s e r l i n k s {N+24+u+1} = [ In , Out1 ] ;
s e r l i n k s {N+24+u+2} = [ In , Out2 ] ;
40 s e r l i n k s {N+24+u+3} = [ In , Out3 ] ;
end
end
B r a i n = addedgev ( Brain , s e r l i n k s ) ;
n o r l i n k s {N+24+u+1} = [ In , Out1 ] ;
n o r l i n k s {N+24+u+2} = [ In , Out2 ] ;
n o r l i n k s {N+24+u+3} = [ In , Out3 ] ;
end
45 end
B r a i n = addedgev ( Brain , n o r l i n k s ) ;
Then get the incidence matrix and the graph’s dot file.
This generates the graph shown in Fig. 20.9. However, this visualization is less than
ideal. It would be better to be able to color portions of the graph differently as well
as organize modules into graph sub clusters. This can be done, although it is tedious.
We have written such code but how we do it depends a great deal on what type of
brain model we are building so it is very hard to automate it. This kind of code is
available on request; we will be handle to share our pain with you! Also, there are
20.4 Building the Brain Model 491
standard ways to generate the many files you can use to make a simple movie from
your simulation. The dot code is setup to allow color changes based on link weight
intensities and so forth. We have also not included that code here. Again, it is hard
to make it generic; we tend to write such code as needed for a project at hand. Still,
contact us if you want to talk about it!
References
S.M. Sherman, R. Guillery, Exploring the Thalamus and Its Role in Cortical Function (The MIT
Press, Cambridge, 2006)
S.M. Sherman, R. Guillery, Functional Connections of Cortical Areas: A New View of the Thalamus
(The MIT Press, Cambridge, 2013)
Part VII
Models of Cognition Dysfunction
Chapter 21
Models of Cognitive Dysfunction
To finish this text, let’s outline a blueprint for the creation of a model of cognitive
function and/or dysfunction which is relatively small computationally.
We are interested in doing this for several important reasons. First, in order to make
judicious statements about both cognitive dysfunction and policies that ameliorate
the problem, we require that there is a functioning model of normal brain function.
This is very hard to do, yet if we attempt it and develop a reasonable approximation,
we can then use the normal brain model to help us understand what happens if
neural modules and their interconnections change due to disease and trauma. Also,
the computational capability of stand alone robotic platforms is always less than
desired, so a model which can perhaps add higher level functions such as emotions
to decision making algorithms would be quite useful. Hence, we will deliberately
search out appropriate approximations to neural computation and deploy them in a
robust, yet small scale scalable architecture in the quest for our models. However,
the real reason is that in order to make progress on many different fronts in cognitive
research, we need an overarching model of neural computation and it engenders high
level things like associative learning and cognition. In our minds, this quest is quite
similar to understanding how to answer difficult questions such as “Which pathway
to cancer is dominant in a colon cancer model?” or “How do we understand the
spread of altruism throughout a population?” and so on. In our earlier sections, we
went through the arguments that can be brought to bear to bring insight into these
important questions and our intent is similar here. How can we phrase and answer
questions of this sort:
• “How is associative learning done?” Answering this sheds light on fundamental
problems in deep learning and how to build better algorithms to accomplish it. It
also is at the heart of understanding shape, handwriting and many other things.
Human volunteers were shown various images and their physiological responses
were recorded in two ways. One was a skin galvanic response and the other a fMRI
parameter. The data shows that the null responses are associated with images that
have no emotional tag. Further, the images cleanly map to distinct 2D locations in
skin response and fMRI space, the emotional grid, when the emotional contents of
the images differ. Hence, we will assume that if a database of music and paintings
were separated into states of anger, sadness, happiness and neutrality, we would
see a similar separation of response. We have designed emotionally labeled data
using a grammatical approach to composition using an approach to assembling data
known as a Würfelspiel matrix which consists of P rows and three columns. In
the first column are placed the nouns; in the third column, are placed the objects;
and in the second column, are placed the verbs. Each sentence fragment Sijk =
Nouni + Verbj + Object k constitutes a composed phrase and there are P3 possible
combinations.
⎡ ⎤
Noun0 Verb0 Object 0
⎢ Noun1 Verb1 Object 1 ⎥
⎢ ⎥
A=⎢ .. .. .. ⎥ (21.1)
⎣ . . . ⎦
NounP−1 VerbP−1 Object P−1
We have used this technique in our development of emotionally tagged music and
painting data in Chaps. 11 and 12. For auditory data, we use musical fragments where
the nouns become opening phrases, the verbs transitions and the objects, the closing.
For the visual data, we use painting compositions made from foreground (the noun),
midground (the verb) and background (the object). Hence, for neutral data plus three
emotionally labeled sets of grammars, we have 4P3 input sequences for our model of
visual and auditory cortex. We assume each musical and painting data presentation
corresponds to a two dimensional emotional grid location which we assume has nice
separation properties. We thus have a set of data which can be used to imprint a brain
model with correct or normal emotional responses.
Each of our directed graphs also has node and edge functions associated with it
and these functions are time dependent as what they do depends on first and sec-
ond messenger triggers, the hardware structure of the output neuron on so forth. We
therefore model the neural circuitry of a brain using a directed graph architecture
consisting of computational nodes N and edge functions E which mediate the trans-
fer of information between two nodes. Hence, if Ni and Nj are two computational
nodes, then Ei→j would be the corresponding edge function that handles informa-
tion transfer from node Ni and node Nj . For our purposes, we will assume here that
the neural circuitry architecture we describe is fixed, although dynamic architec-
tures can be handled as sequence of directed graphs. We organize the directed graph
using interactions between neural modules (visual cortex, thalamus etc.) which are
themselves subgraphs of the entire circuit. Once we have chosen a direct graph to
represent our neural circuitry, note the addition of a new neural module is easily
handled by adding it and its connections to other modules as a subgraph addition.
498 21 Models of Cognitive Dysfunction
Hence, at a given level of complexity, if we have the graph G(N, E) that encodes the
connectivity we wish to model, then the addition of a new module or modules simply
generates a new graph G (N , E ) for which there are straightforward equations for
explaining how G relates to G which are easy to implement. The update equations
for a given node then are given as an input/output pair. For the node Ni , let yi and Yi
denote the input and output from the node, respectively. Then we have
yi (t + 1) = Ii + Ej→i (t) Yj (t)
j∈B(i)
Yi (t + 1) = σi (t)(yi (t)).
where Ii is a possible external input, B(i) is the list of nodes which connect to the
input side of node Ni and σi (t) is the function which processes the inputs to the
node into outputs. This processing function is mutable over time t because sec-
ond messenger systems are altering how information is processing each time tick.
Hence, our model consists of a graph G which captures the connectivity or topol-
ogy of the brain model on top of which is laid the instructions for information
processing via the time dependent node and edge processing functions. A sim-
ple look at edge processing shows the nodal output which is perhaps an action
potential which is transferred without change to a synaptic connection where it
initiates a spike in Ca++ ions which results in neurotransmitter release. The effi-
cacy of this release depends on many things, but we can focus on four: ru (i, j),
the rate of reuptake of neurotransmitter in the connection between node Ni and
node Nj ; the neurotransmitter is destroyed via an appropriate oxidase at the rate
rd (i, j); the rate of neurotransmitter release, rr (i, j) and the density of the neurotrans-
mitter receptor, nd (i, j). The triple (ru (i, j), rd (i, j), rr (i, j)) ≡ T (i, j) determines a
net increase or decrease of neurotransmitter concentration between the two nodes:
rr (i, j) − ru (i, j) − rd (i, j) ≡ rnet (i, j). The efficacy of a connection between nodes is
then proportional to the product rnet (i, j)×nd (i, j). Hence, each triple is a determining
signature for a given neurotransmitter and the effectiveness of the neurotransmitter
is proportional to the new neurotransmitter flow times the available receptor density.
A very simple version of this is to simply assign the value of the edge processing
function Ei→j to be the weight Wi,j as is standard in a simple connectionist archi-
tecture. Of course, it is more complicated as our graphs allow feedback easily by
simply defining the appropriate edge connections. All of these parameters are also
time dependent, so we can add a (t) to all of the above to indicate that, but we have
not done so as we do not want too much clutter. We have worked out how to use
approximations to nodal computation in Chaps. 6 and 9 and more details of how these
approximations can be used in asynchronous computation environments are given
in Peterson (2015).
21.1 Cognitive Modeling 499
Recall, we assume there is a vector function f which assigns an output to each node.
In the simplest case, this vector function takes the form of f (i) = σi and, of course,
these computations can be time dependent. Given an arbitrary input vector x, the DG
computes node outputs via an iterative process. We let U denote the nodes which
receive external inputs and B(i), the backward set of node Ni . The edge from node
Ni to node Nj is assigned an edge function Ei→j which could be implemented as
assigning a scalar value to this edge. Then, the evaluation is shown in Table 21.1a,
where N is the number of nodes in the DG and the ith node function is given by the
function σ i which processes the current node input yi .
We can also adjust parameters in the DG using the classic idea of a Hebbian
update which would be implemented as shown in Table 21.1b: choose a tolerance
and at each node in the backward set of the post neuron i, update the edge value
by the factor ζ if the product of the post neuron node value fi and the edge value
Ej→i exceeds . The update algorithm then consists of the paired operations: sweep
through the DG to do an evaluation and then use both Hebbian updates and the graph
flow equations to adjust the edge values.
21.2 Information
∇ 2 f − α f − β f = −I. (21.2)
KK T f − α f − β f = −I.
which gives
where Id is the appropriate identity matrix. Let Hα,β denote the operator
KK T − (α + β)Id. We thus have the iterative equation
f (t + 1) = −H−1 −1
α,β β f (t) − Hα,β I(t + 1).
f (t + 1) = (t + 1).
Now let’s switch to a more typical nodal processing formulation. Recall, each node
N i has an input yi (t) which is processed by the node using the function σi as Y i (t) =
σi (yi (t)). The node processing could also depend on parameters, but here we will
only assume an offset and a gain is used in each computation. Thus, the node values
f are equivalent to the node outputs Y. This allows us to write the update as
or
For simplicity, let’s assume a sigmoid nodal computation for each node:
x−o
σi (x) = 0.5 1 + tanh .
g
where o and g are traditional offset and gain parameters for the computation. It is
straightforward to find
gi 1 + i (t + 1)
σi−1 (i (t + 1)) = Oi + ln
2 1 − i (t + 1)
where Oi is the offset to node i and g i is the gain. We also know the graph evaluation
equations for yi then give
gi 1 + i (t + 1)
yi (t) = Ej→i (t) Yj (t) = Oi + ln
j∈B(i)
2 1 − i (t + 1)
where we don’t need to add a nodal input as that is taken care of in the external input
I. Pick the maximum Ej→i (t) Yj (t) component and label its index jM . Then set
gi 1 + i (t + 1)
EjM →i (t) YjM (t) = Oi + ln
2 1 − i (t + 1)
1 g 1 + i (t + 1)
EjM →i (t) = Oi + i ln
YjM (t) 2 1 − i (t + 1)
We could also decide to do this update for a larger subset of the indices contributing to
the nodal output value in which case several edge weights could be updated for each
iteration. For example, this update could be undertaken for all synaptic edges that
meet a standard Hebbian update tolerance criterion. Implementing this update model
into Matlab is quite similar to what we have already shown in the Hebbian case.
Given the update strategies, above it is worth mentioning that once we have a given
graph structure for our model, we will probably want to update it by adding module
interconnections, additional modules and so forth. We find this is our normal pathway
to building a better model as our understanding of the neural computation we need
502 21 Models of Cognitive Dysfunction
where N is the number of intermodule edges we have added. The new incidence
matrix is thus 29 × (42 + N). The matrix K T is then
⎡ ⎤
K T1 OT2
⎢ (12 × 9) (12 × 20) ⎥
⎢ ⎥
⎢ ⎥
⎢ O T
K ⎥
KT − ⎢ 1 2
⎥
⎢ (30 × 9) (30 × 20) ⎥
⎢ ⎥
⎣ CT DT ⎦
(N × 9) (N × 20)
After multiplying, we have (multiplies involving our various zero matrices O1 and
O2 vanish)
21.2 Information 503
⎡ ⎤
K 1 K T1 + CC T CDT
⎢ (9 × 9) (9 × 20) ⎥
KK T = ⎢
⎣
⎥
DC T
K 1 K T1 + CC T ⎦
(20 × 9) (20 × 20)
for zero sub matrices of appropriate sizes in each matrix all labeled O. Recall the
Laplacian updating algorithm is given by
∂f
KK T f − α − β f = −I.
∂t
which can be written in finite difference form using f (t) for the value of the node
values at time t as
where I(t) is the external input at time t and f (0) = 0 and otherwise f (t) =
f (t) − f (t − 1). Now divide the node vectors f and I into the components fI , fII ,
II and III to denote the nodes for subgraph G1 and G2 , respectively. Then, we see
we can write the Laplacian for the full graph in terms of the components used to
assemble the graph from its modules and module interconnections.
fI K 1 K T1 O fI CC T O fI
KK T = +
fII O K 2 K T2 fII O DDT fII
T
O CD fI
+
DC T O fII
Hence, the terms CC T + CDT fI , DDT + DC T fII , CDT fII and DC T fI represent
the mixing of signals between modules. We see if we have an existing model we
can use these update equations to add to an existing graph a new graph. Thus, we
can take a trained brain module and add a new cortical sub module and so forth and
to some extent retain the training effort we have already undertaken. Now add time
indices and expand the difference terms to get
If we wanted to train a portion of the cortex to match sensory data, we can organize
our brain graph model as two modules. The first graph is all of the model except
the portion of the cortex we want to train. Hence, the two module update equations
derived in the previous section are applicable. Let’s assume the cortical submodule we
are interested in is the second graph and let EII be the orthonormal basis consisting of
the eigenvectors for the matrix LII with λII the corresponding vector of eigenvalues.
Then using the representation of vectors with respect to the basis EII , we have
tj j tj j
fIIt = j fII E II and IIIt = j III E II and so
tj j
tj j
t−1,j j
tj j
fII LII E II + fII DDT E II + DC T fIt = fII α E II − III E II .
j j j j
j j j
However, we also know LII E II = λII E II . Thus, we obtain
tj j j
tj j
t−1,j j
tj j
fII λII E II + fII DDT E II + DC T fIt = fII α E II − III E II .
j j j j
21.2 Information 505
j j
Define the new vectors FII = DDT E II and gII
t
= DC T fIt . Substituting these into
our expression, we find
tj j t−1,j tj
j
tj j
fII λII − αfII + III E II + fII FII + gII
t
= 0.
j j
j j jk j
We can expand FII and gII t
also in the basis EII giving FII = FII E II and
tj j
k
gII = j gII E II . From these expansions, we find
t
tj j t−1,j tj tj
j
tj jk
fII λII − αfII + III + gII E II + k
fII FII EII = 0.
j j k
tj j t−1,j tj tj
kj
fII λII − α fII + III + gII + fIItk FII = 0.
k
t,j
We now have a set of equations for the unknowns fII . We can rewrite this as a matrix
equation. Let II denote the diagonal matrix
⎡ 1 ⎤
λII 0 0 ... 0
⎢0 λ1II 0 ... 0 ⎥
⎢ ⎥
II = ⎢ . .. .. .. .. ⎥
⎣ .. . . . . ⎦
0 ... 0 0 λPII
where P is the number of nodes in the second module. Further, let FII be the matrix
⎡ ⎤
11
FII 21
FII ... P1
FII
⎢ F12 22
... P2 ⎥
⎢ II FII FII ⎥
FIIT = ⎢
⎢ .. .. ..
⎥
⎥
⎣ . . . ⎦
1P
FII 2P
FII ... PP
FII
506 21 Models of Cognitive Dysfunction
t,t−1
Finally, let DII = − IIIt + gIIt
− α fIIt−1 . Then we have II + F T fIIt =
t,t−1
DII . It then follows that we can update via
−1 t,t−1
fIIt = II + F T DII = HtII
where we know the matrix inverse exists because CC T is invertible. This is the same
form as the calculations we did in Sect. 21.2.1. Let’s assume we have a sigmoidal
nodal computation σi (x) = 0.5 1 + tanh x−o
g
and so fIIti = σi (ytiII ) = ξII
ti
or
ytiII = σi−1 ξII
ti
where o and g are traditional offset and gain parameters for the
computation. We know the graph evaluation equations for ytiII then give
tj gi 1 + ξII
ti
ytiII = Ej→i (t) fII = Oi + ln
j∈B(i)
2 1 − ξII
ti
tj
Pick the maximum Ej→i (t) fII component and label its index jM . Then set
gi 1 + ξII
ti
EjM →i (t) fIItM = Oi + ln
2 1 − ξII
ti
1 gi 1 + ξII
ti
EjM →i (t) = Oi + ln
fIItM 2 1 − ξII
ti
Thus, we have a method for updating a portion of the parameters that influence each
nodal output. We combine this with the standard Hebbian updates with inhibition to
complete the training algorithm.
We can also turn this into recursive equation by replacing critical components at
time t with time t − 1 to give
tj j t−1,j tj t−1,j
fII λII = α fII − III − gII + < fIIt−1 FII
k
>.
tj 1 t−1,j tj t−1,j
fII = j
α fII − III − gII + < fIIt−1 FII
k
> .
λII
21.2 Information 507
which allows us to alter the parameters of the jth nodal function using its inverse
(fII )−1 as we did earlier.
j
To update the first graph in the pair, we use the update equation
The arguments we present next are quite similar to the ones presented for submodule
two training; hence, we will be briefer in our exposition. The first graph is all of the
model except the portion of the cortex we want to train. Let EI be the orthonormal
basis consisting of the eigenvectors for the matrix LI with λI the corresponding
vector of eigenvalues. Then using the representation of vectors with respect to the
tj j tj j tj j
basis EI , we have fIt = j fI E I and IIt = j II E I and so j fI LI E I +
tj j t−1,j j tj j
j fI CC E I + CD fII = α E I − j II E I . However, we also know
T T t
j fI
j j j
LI E I = λI E I . Thus, we obtain
tj j j
tj j
t−1,j j
tj j
f I λI E I + fI CC T E I + CDT fIIt = fI αEI − II E I .
j j j j
j j
Define the new vectors FI = CC T E I and gIt = CDT fIIt . Substituting these into our
expression, we find
tj j t−1,j tj
j
tj j
fI λI − αfI + II EI + fI FI + gIt = 0.
j j
j j jk j
We can expand FI and gIt also in the basis EI giving FI = k FI E I and
tj j tj j t−1,j tj tj
gIt = j gI E I . From these expansions, we find j fI λI − αfI + II + gI
j tj jk
EI + k fI FI EI = 0. Now reorganize
k
the double sum to write
j
tk kj j
tj j t−1,j tj tj
j f I λI − α f I + II + gI + k fI FI E I = 0. This immediately implies
tj j t−1,j tj tj kj
that for all indices j, we must have fI λI − α fI + II + gI + k fItk FI = 0.
tj j t−1,j tj tj
This can be rewritten as fI λI − α fI + II + gI + < fI FI >= 0. We now have a
t k
t,j
set of equations for the unknowns fI . We can rewrite this as a matrix equation. Let
I denote the diagonal matrix diag(I ) similar to what we did before. Further, let
ij
FI be the matrix (FI ). Then, we can rewrite the update equation more succinctly
as
Finally, let DIt,t−1 = − IIt + gIt − α fIt−1 . Then we have I + FIT fIt = DIt,t−1 .
It then follows that we can update via
−1
fIt = I + FIT DIt,t−1 = HtI
where we know the matrix inverse exists because CC T is invertible. Again, assuming
we have a sigmoidal nodal computation, we find fIti = σi (ytiI ) = ξIti or ytiI = σi−1 ξIti
where o and g are traditional offset and gain parameters for the computation. We know
the graph evaluation equations for yi then give
tj gi 1 + ξIti
ytiI = Ej→i (t) fI = Oi + ln
j∈B(i)
2 1 − ξIti
tj
Pick the maximum Ej→i (t) fI component and label its index jM . Then set
gi 1 + ξIti
EjM →i (t) fItM = Oi + ln
2 1 − ξIti
1 gi 1 + ξIti
EjM →i (t) = Oi + ln
fItM 2 1 − ξIti
Thus, we have a method for updating a portion of the parameters that influence each
nodal output. We combine this with the standard Hebbian updates with inhibition to
complete the training algorithm.
We can also turn this into recursive equation by replacing critical components at
tj j t−1,j tj t−1,j
fI λI = α f I
time t with time t − 1 to give − II − gI + < fI FI >. This
t−1 k
tj t−1,j tj t−1,j
gives the update fI = 1
j α fI − II − gI + < fIt−1 FIk > which allows us
λI
to alter the parameters of the jth nodal function using its inverse ( fI )−1 as we did
j
earlier. Thus we have a mechanism to update the submodule of the rest of the neural
model we are interested in using the orthonormal basis its subgraph provides.
Let’s address the issue of developing a normal brain model for lesion studies. Indeed,
we can also look at the difference between a normal and a dysfunctional brain model
as a way to determine a palliative strategy that might restore normalcy. Let’s focus
on three neurotransmitters here: dopamine, serotonin and norepinephrine. It is well
know that disturbances in neurotransmitter signalling and processing are related to
21.3 A Normal Brain Model 509
mood disorders. A good review of this is given in Russo and Nestler (2013). In fact,
reward—seeking behavior circuitry is probably common across many animals as
described in Barron et al. (2010). Our model here will be a first attempt at quan-
tifying these interactions; simplified but with some explanatory insight. Each has
an associated triple (ru , rd , rr ) ≡ T which we label with a superscript for each
neurotransmitter. Hence, T D , T S and T N are the respective triples for our three neu-
rotransmitters dopamine, serotonin and norepinephrine, Consider a simple graph
model G(N, E) as shown in Fig. 21.1.
The auditory and visual training data are given in four emotional labellings; neu-
tral, sad, happy and angry. We assume that each emotional state corresponds to
neurotransmitter triples (TND , TNS , TNN ) (neutral), (TSD , TSS , TSN ) (sad), (THD , THS , THN )
(happy) and (TAD , TAS , TAN ) (angry), In addition, we assume that these triple states can
be different in the visual, auditory and associative cortex and hence, we add an addi-
D S N
tional subscript label to denote that. We let (Tα,β , Tα,β , Tα,β ) be the triple for cortex
β (β = 0 is visual cortex, β = 1 is auditory cortex and β = 2 is associative cortex)
and emotional state α (α = 0 is neutral, α = 1 is sad, α = 2 is happy and α = 3 is
angry). Hence, we are using our emotionally labeled data as a way of constructing
a simplistic normal brain model. Of course, it is much more complicated than this.
Emotion in general is something that is hard to quantify and even somewhat difficult
to find good measures for what state a person is in. A good review of that problem is
in Mauss and Robinson (2009). Cortical activity that is asymmetric probably plays
a role here, see Harmon-Jones et al. (2010), and what is nice about our modeling
choices is that we have the tools to model such asymmetry using the graph models.
However, our first model is much simpler.
The graph encodes the information about emotional states in both the nodal
processing and the edge processing functions. Let’s start with auditory neutral data
and consider the graph shown in Fig. 21.2.
The neurotransmitter triples can exist in four states: neutral, sad, happy and
γ
angry which are indicated by the toggle α in the triples Tα,β where γ denotes the
neurotransmitter choice (γ = 0 is dopamine, γ = 1 is serotonin and γ = 2 is
γ γ
norepinephrine). For auditory neutral data, there is a choice of triple T0,1 and T0,2
for each neurotransmitter which corresponds to how the brain labels this data as
emotionally neutral. Hence, we are identifying a collection of six triples or eighteen
numbers with neutral auditory data. This is a vector in 18 which we will call V 0,1,2 .
We need to do this for the other emotional states which gives us three additional
vectors V 1,1,2 , V 2,1,2 and V 3,1,2 . Thus, emotional processing for emotionally labeled
auditory data requires us to set four vectors in 18 . We need to do the same thing for
the processing of emotionally labeled visual data which will give us the four vectors
V 0,0,2 , V 1,0,2 , V 2,0,2 and V 3,0,2 . To process the data for both types of sensory cortex
thus requires eight vectors in 18 . Choose 8 orthonormal vectors in 18 to correspond
to these states. To train the full graph G(N, E) to understand neutral auditory data, it is
a question of what internal outputs of the graph should be fixed or clamped and which
should be allowed to alter. For neutral auditory data, we clamp the neurotransmitter
triples in the auditory cortex and associative cortex using the vector V 0,1,2 and we
force the output of the associative cortex to be the same as the incoming auditory
510
Auditory
EaA
V isual Assoc
EvA
EIa
EM a
EIv
EM v EAT
Input
EIT
T halamus M idbrain
ET M
Fig. 21.1 A simple cognitive processing map focusing on a few salient modules. The edge connections are of the form Eα,β where α and β can be a
selection from v (Visual), a (Auditory), A (Associative), T (Thalamus), M (Midbrain) and I (Input)
21 Models of Cognitive Dysfunction
21.3 A Normal Brain Model 511
Auditory Assoc
EaA
N eutral
EM a
EAT
Input EM A
N eutral M idbrain
T halamus ET M
Sad
EM a
EAT
Input EM A
Sad M idbrain
T halamus ET M
data. The midbrain module makes the neurotransmitter connections to each cortex
module, but how these connections are used is determined by the triples and the
usual nodal processing. In a sense, we are identifying the neutral auditory emotional
state with the vector V 0,1,2 . We assume the Thalamus module can shape processing
in the midbrain. Hence, we will let V 0,1,2 be the desired output for the Thalamus
module for neutral auditory data. The Midbrain module accepts the input V 0,1,2 and
uses it to set the triple states in auditory and associative cortex. In effect, this is an
abstraction of second messenger systems that affect the neural processing in these
cortical modules. We can do this for each emotional state. For example, Fig. 21.3
shows the requisite processing for the cortical submodules.
Processing the sad auditory data clamps the triples in auditory and associative
cortex using V 1,1,2 and clamps the associative cortex output to the incoming sad
auditory data. We then train the graph to have a Thalamus output of V 1,1,2 . We do
512 21 Models of Cognitive Dysfunction
M usicInput Auditory
M usic
EM a
M idbrain EaA
M usic EM A Assoc
EAT
ET M
T halamus
EM A
EvA
P ainting
V isualInput V isual
P ainting
the same thing for the other emotional states for auditory and for all the emotional
states for the visual cortex data.
Now consider the graph model accepting new music and painting data. We see
the requisite graph in Fig. 21.4.
The music data and painting data will generate an output vector W from the Thala-
mus module in addition to generating associative cortex outputs corresponding to a
music and painting fragment on the basis of the encoded edge and nodal processing
that has been set by the training. Hence, we can classify the new combined audi-
tory and visual data as corresponding to an emotional labeling of the form shown in
Eq. 21.3.
4
4
W = < W , V i,1,2 > V i,1,2 + < W , V i,0,2 > V i,0,2 (21.3)
i=1 i=1
Let’s summarize what we have discussed. For a given brain model G(V, E) which
consists of auditory sensory cortex, SCI, visual cortex, SCII, Thalamus, Th, Asso-
ciative Cortex, AC and Midbrain, MB, we can now build its normal emotional state.
γ
We assign neurotransmitter triples Tα,β for the four emotional states of neutral, sad,
happy and angry for each neurotransmitter. These eighteen values for eight possi-
ble states are chosen as orthonormal vectors in 18 but our choice of values for
these triples can also be guided by the biopsychological literature but we will not
discuss that here. The auditory and visual input data can be mapped into exclusive
regions of skin galvanic response and a fMRI parameter so part of our modeling is
to create a suitable abstraction of this mapping from the inputs into the two dimen-
sional skin galvanic response and fMRI space which is 2 . There are papers that
discuss how to model the fMRI response we can use for this; see Friston (2005) and
Stephan et al. (2004).
To construct the normal model, we do the following:
• For neutral music data, we train the graph to imprint the emotional states as follows:
for neutral music inputs to SCI, the clamped AC output is the same neutral music
γ γ
input and the clamped Th to MB output are the neutral triples T0,1 and T0,2 encoded
as V 0,1,2 . When training is finished, the neutral music samples are recognized as
neutral music with the neutral neurotransmitter triples engaged.
• For neutral painting data, use the combined evaluation/Hebbian update loop to
imprint the emotional states by clamping AC output to the neutral paint input to
γ γ
SCII and clamping the Th to MB output to the neutral triples T0,1 and T0,2 encoded
as V 0,0,2 . When training is finished, the neutral painting samples are recognized
as neutral paintings with the neutral neurotransmitter triples engaged.
• Do the same training loop for the sad, happy and angry music and painting data
samples using the sad, happy and angry neurotransmitter triples. The AC output
is clamped to the appropriate data input values and the Th to MB output is the
required neurotransmitter triples.
At this point, the model G(V, E) assigns known emotionally labeled data in two
sensory modalities to their correct outputs in associative cortex. A new music and
painting input would then be processed by the model and the Th to MB connections
would assign the inputs to a set of neurotransmitter triple states W and generate
a set of AC outputs for both the music and painting inputs. These outputs would
be interpreted as a new musical sequence and a new painting. We can think of this
trained model G(V, E) as defined the normal state. Also, as shown in Eq. 21.3, we
can assign an emotional labeling to the new data. We can model dysfunction at this
point by allowing changes in midbrain processing.
• Alter T D , T S and T N in the auditory, visual and associative cortex to create a new
vector W and therefore a new model G new (N, E). The nodes and edges do not
change but the way computations are processed does change. The change in model
514 21 Models of Cognitive Dysfunction
can be labeled δG. Each triple change (δT D , δT S , δT N ) gives a potential mapping
to cognitive dysfunction.
• Using the normal model G(N, E), map all neutral music and painting data to sad
data. Hence, we clamp AC outputs to sad states but do not clamp the Th to MB
mapping. This will generate new neurotransmitter triple states in MB which we can
label W sad . The exhibited changes (δT D , δT S , δT N ) give us a quantitative model
of a cognitive dysfunction having some of the characteristics of depression. Note
all 18 parameters from all 3 neurotransmitters are potentially involved. Note also
standard drugs such as Abilify only make adjustments to 2 or 3 of these parameters
which suggest our current pharmacological treatments are too narrow in scope. If
we know the normal state, this model would suggest to restore normalcy, we apply
drugs which alter neurotransmitter activity by (−δT D , −δT S , −δT N ).
• Lesions in a brain can be simulated by the removal of nodes and edges from
the graph model G(N, E). This allows us to study when the normal emotional
responses alter and lead to dysfunction.
• If we construct a model GL (NL , EL ) for the left half of the brain, a model
GR (NR , ER ) and a model of the corpus callosum, GC (NC , EC ), we can combine
these modules using the ideas from Sect. 21.2.2 to create a brain model G(N, E),
which can be used to model problems with right and left half miscommunication
problems such as schizophrenia which could be very useful. The interconnections
between the right and left half of the brain through the corpus callosum modules
and neurotransmitter connections can be altered and the responses of the altered
model to a normal brain model can be compared for possible insight.
We think the most important problem in understanding cognition and the working
of an full brain model is the issue of how features are extracted from environmen-
tal input. There are various brain structures that do this and it appears that cortical
structures evolved to handle this task quite some time ago. In Wang et al. (2010),
careful studies of the chicken brain have found a region in the telencephalon that is
similar to the mammalian auditory cortex and it is also comprised of laminated layers
of cells linked by narrow radial columns. Evidence is therefore mounting that the
complex circuitry needed to parse environmental input and do feature extraction and
associative learning probably evolved in an ancestor common to both mammals and
birds at least 300 million years ago. Further, in Sanes and Zipursky (2010), there is a
fascinating discussion about the similarities and differences between the mammalian
visual cortex and the fly visual cortex. For example, both visual systems use a small
number of neuron types divided into many subtypes for specialized computation.
They use multiple synaptic contacts to a single presynaptic terminal which connects
to multiple postsynaptic sites. Both use multiple cellular layers with regular arrange-
ments of neurons in the layers with an ordered way of mapping computation between
layers. Both use lateral interactions through parallel relays to heavily process the raw
environmental input received by the photoreceptors in the eyes. They use repeated
local modules to handle the global environmental input field and different visual
functions are helped by pathways that act in parallel. The main structure for the
parallel processing is the organization of the many synaptic connections into parallel
21.3 A Normal Brain Model 515
laminar regions. The results of this processing is then passed to other portions of the
cortex where further feature extraction and associative learning is completed. It may
be that the organization of the fly visual system and the mammalian visual system do
not share a common evolutionary origin though in contrast to what is now suspected
about the common origins of these systems in mammals and birds. Instead, this may
appears to be an example of convergent evolution at work: the problems of handing
the environmental input shaped the evolution of this type of circuity. However, we
also know that the vertebrate transcription factor Pax6 and its fly orthologue eyeless
is crucial for eye development so it is still possible that there is a shared evolutionary
origin.
There is much work to be done to develop our cognitive dysfunction model and
this will be main topic in the next volume. We will use these tools to build both
models of human cognitive dysfunction and also models of small brained animals.
References
A. Barron, E. Søvik, J. Cornish, The roles of dopamine and related compounds in reward-seeking
behavior across animal phyla. Front. Behav. Neurosci. 4(163), 1–9 (2010) (Article 163)
K. Friston, Models of brain function in neuroimaging. Annu. Rev. Psychol. 56, 57–87 (2005)
(Annual Reviews)
E. Harmon-Jones, P. Gable, C. Peterson, The role of asymmetric frontal cortical activity in emotion-
related phenomena: a review and update. Biol. Psychol. 84, 451–462 (2010)
P. Lang, M. Bradley, J. Fitzimmons, B. Cuthbert, J. Scott, B. Moulder, V. Nangia, Emotional arousal
and activation of the visual cortex: an fMRI analysis. Psychophysiology 35, 199–210 (1998)
I. Mauss, M. Robinson, Measures of emotion: a review. Cognit. Emot. 23, 209–237 (2009)
J. Peterson, Nodal computation approximations in asynchronous cognitive models. Comput. Cognit.
Sci. (2015) (In Press)
S. Russo, E. Nestler, The brain reward circuitry in mood disorders. Nat. Rev.: Neurosci. 14, 609–625
(2013)
J. Sanes, S. Zipursky, Design principles of insect and vertebrate visual systems. Neuron 66(1),
15–36 (2010)
K. Stephan, L. Harrison, W. Penny, K. Friston, Biophysical models of fMRI responses. Curr. Opin.
Neurobiol. 14, 629–635 (2004)
Y. Wang, A. Brzozowsha-Prechtl, H. Karten, Laminary and columnar auditory cortex in avian brain.
Proc. Natl. Acad. Sci. 107(28), 12676–12681 (2010)
Part VIII
Conclusions
Chapter 22
Conclusions
This text has introduced the salient issues that arise in cognitive modeling. We have
used relatively simple models of neural computation to construct graph models of
brain circuitry at essentially any level of detail we wish. In Chap. 21, we have outlined
a simple model of a normal brain and shown how, in principle, we could develop
a model of cognitive dysfunction from that. To finish out this book, let’s look at
some really hard problems that we have been building various tools to address. Let’s
consider a model of schizophrenia. Using the notations of Chap. 21, a first look could
be this. If we were to model schizophrenia with a left and right brain and a connecting
corpus callosum, the two modules would be the left and right brain subgraph and
the corpus callosum subgraph. Let’s look at this two graph system where the first
graph contains the left and the right brain and the second graph contains the corpus
callosum. Assume we have trained on T samples using the module training equations
I + F IT f It + I It + g It − α f It−1 = 0
I I + F ITI f It I + I It I + g It I − α f It−1
I =0
At convergence, we obtain stable values for f It and f It I we will label as f I∞ and f I∞I .
Then we must have
I + F IT f I∞ + I It + g ∞ ∞
I − α fI =0
∞ t ∞ ∞
I I + FI I fI I + II I + gI I − α fI I = 0
T
A change in the connectivity from the Corpus Callosum to the left and right brains
and vice-versa implies a change C → C and D → D . This changes terms which
are built from the matrices C and D, such as F IT = D D T E I I which changes to
D ( D )T E I I . After the connectivity changes are made, the graph can evaluate all
T inputs to generate a collection of new outputs SI = (( f I1 ) , . . . , f IT ) ) and SI I =
(( f I1I ) , . . . , ( f ITI ) ). The original outputs due to the trained graph are SI and SI I .
Hence, any measure of the entries of these matrices gives us a way to compare the
performance of the graph model of the brain with a correct set of corpus callosum
© Springer Science+Business Media Singapore 2016 519
J.K. Peterson, BioInformation Processing, Cognitive Science and Technology,
DOI 10.1007/978-981-287-871-7_22
520 22 Conclusions
for our brain models, the results in Zhang et al. (2012) help us understand how to set
up the graph for a schizophrenia model.
There have been recent and exciting results in treating depressive states, suicidal
impulses that are often associated with schizophrenia which are based on using an
N-methyl-D-aspartate (NMDA) receptor antagonist using the drug ketamine. A com-
plete review of the use of ketamine for major depressive problems and bipolar depres-
sion is given in Lee et al. (2015). How it is used in chronic stress, attention disorders
and depressive states is discussed in Li et al. 2011, KNott et al. (2011) and Murrough
et al. (2013). Hence, the tools and background given to you in this text will help you
on the next journey to develop interesting neural models of cognitive function and
dysfunction such as we see in depression and schizophrenia.
Finally, since we know hormones play a major role in brain plasticity as evidenced
by Garcia-Segura (2009), we know that our graph models can also be tuned or altered
by external hormone based second messenger triggers. We know that in parasitic
wasps, the wasp injects a large neurotoxin cocktail into a cockroach’s brain which
then initiates large scale changes in behavior. In Chap. 9, we discuss toxins that
can alter the action potential of a neuron in various ways. The cocktail the wasp
injects is very large scale and not a small scale input to the neural model of the
cockroach at all. A general outline of parasitic manipulation of a host is given in
Weinersmith and Faulkes (2014) and more specific information about how the wasp
does this with the cockroach is discussed in Libersat and Gal (2014). Note this cocktail
effectively reorganizes the interacting modules of the cockroach brain to perform
altered behavior. Thus, there is proof of concept for such a large reorganization of
the neural modules of any brain via an appropriate input package. With the graph
modeling tools we have developed and suitable approximations, we can hope to build
a model of this process also which in the past has been very hard to both conceptualize
and to build.
In the next volume, all of these ideas will be used to build real models.
References
Y. Goto, C. Yang, S. Otani, Functional and dysfunctional synaptic plasticity in prefrontal cortex:
roles in psychiatric disorders. Biol. Psychiatr. 67, 199–207 (2010)
K. Hashimoto, Emerging role of glutamate in the pathophysiology of major depressive disorder.
Brain Res. Rev. 61, 105–123 (2009)
V. KNott, A. Millar, J. McIntosh, D. Shah, D. Fisher, C. Blais, V. Ilivitsky, E. Horn, Separate
and combined effects of low dose ketamine and nicotine on behavioral and neural correlates of
sustained attention. Biol. Psychol. 88, 83–93 (2011)
E. Lee, M. Della-Selva, A. Liu, S. Himelhoch, Ketamine as a novel treatment for major depressive
disorder and bipolar depression: a systematic review and quantitative meta-analysis. Gener. Hosp.
Psychiatr. 37, 178–184 (2015)
N. Li, R. Liu, J. Dwyer, M. Banasr, B. Lee, H. Son, X. Li, G. Aghajanian, R. Duman, Glutamate N-
methyl-D-aspartate receptor antagonists rapidly reverse behavioral and synaptic deficits Caused
by Chronic stress exposure. Biol. Psychiatr. 69, 754–761 (2011)
F. Libersat, R. Gal, Wasp voodoo rituals, venom—cocktails, and the zombification of cockroach
hosts. Integr. Comp. Biol. 54, 129–142 (2014)
J. Murrough, D. Iosifescu, L. Chang, R. Al Jurdi, C. Green, A. Perez, S. Iqbal, S. Pillemer, A. Foulkes,
A. Shah, D. Charney, S. Mathew, Antidepressant efficacy of ketamine in treatment-resistant major
depression: a two-site randomized controlled trial. Am. J. Psychiatr. 170, 1134–1142 (2013)
M. Piper, M. Beneyto, T. Burne, D. Eyles, D. Lewis, J. McGrath, The neurodevelopmental hypothesis
of schizophrenia: convergent clues from epidemiology and neuropathology. Psychiatr. Clin. N.
Am. 35, 571–584 (2012)
K. Weinersmith, Z. Faulkes, Parasitic manipulation of host’s phenotype, or how to make a zombie—
an introduction the symposium. Integr. Comp. Biol. 54, 93–100 (2014)
S. Wood, A. Yung, P. McGorry, C. Pantelis, Neuroimaging and treatment evidence for clinical staging
in psychotic disorders: from the at-risk mental state to chronic schizophrenia. Biol. Psychiatr. 70,
619–625 (2011)
Y. Zhang, L. Lin, C. Lin, Y. Zhou, K. Chou, C. Lo, T. Su, T. Jiang, Abnormal topological organization
of structural brain networks in schizophrenia. Schizophr. Res. 141, 109–118 (2012)
Part IX
Background Reading
Chapter 23
Background Reading
We have been inspired by many attempts by people in disparate fields to find meaning
and order in the vast compilations of knowledge that they must assimilate. Like
them, we have done a fair bit of reading and study to prepare ourselves for necessary
abstractions we need to make in our journey. We have learned a lot from various
studies of theoretical biology and computation and so forth. You will need to make
this journey too, so to help you, here are some specific comments about the sources
we have used to learn from. In addition to the books we mentioned in Peterson
(2015a, b, c), we have some further recommendations.
• Biochemical Messengers in Hardie (1991). This is the first book that we began to
read about information processing in the nervous system on a technical level. It has
useful insights. All in all, we use this one as a technical reference to sometimes dip
into. This book explains a lot of the technical detail behind the messaging systems
we encounter in our studies.
• Information in the brain in Black (1991). While dated now, this has been an impor-
tant source as it tries hard to develop a unified theory of information processing.
It is an absolutely stunning attempt to understand the how and why of informa-
tion processing in the large scale nervous system. As you will see in the body of
this report, we have quoted from this work extensively. There is so much inter-
esting speculation in here. It is also extremely technical and unapologetically
so!! We used Black’s ideas extensively in our previous attempt at developing a
good abstraction for biological detail into software design. This is laid out in
Peterson (2001).
• The Evolution of Vertebrate Design in Radinsky (1987). This is a wonderful book
that attempts to look at the evolution of the vertebrate body plan in very abstract
terms.
• Large scale computational theories of the brain in Koch and Davis (1994). This is
another useful collection of attempts to model biological complexity and see the
big picture.
• Neural organization in Arbib et al. (1998). Here Arbib and his coauthors spend a lot
of time discussing their approach to seeing the hidden structure in brain circuitry.
There are a lot of good ideas in here.
• The paper (Baslow 2011) discusses the modular organization of the vertebrate
brain. Pay attention to how these ideas can influence how we build our models
using the graph tools in MatLab.
• Two layman’s books on how our notions of morality may have evolved and a
general introduction to ideas in evolutionary psychology are Buller (2006) and
Hauser (2006). These are less technical but provide needed overview.
We believe there is a lot to learn from studying how organisms have evolved nervous
systems. The primitive systems contain a minimal set of structures to allow control,
movement and environmental response. We can see in the evolutionary record how
as additional structure is added, we not only see a gain in capability but also more
constraints on the organism in terms of energy requirements and so forth. Some of
the vary useful papers in this area are listed below:
23.3 Nervous System Evolution and Cognition 527
• Catania (2000) analyzes the organization of cortex in the insects. The connection
between the development of sensory modules to understand the environment and
the fusion required to build higher level precepts for enhanced survival can be
studied in many animals. However, there are advantages to studying these things
in animals with smaller cognitive systems.
• In Deacon (1990), there is a very interesting review of the state of the art in thinking
about brain evolution as of 1990.
• In Miklos (1993), there is even more of an attempt to use the ideas from the
evolution of nervous systems to help guide the development of artificial nervous
systems for use in agents, artificial life and so forth. Miklos has strong opinions
and there is much to think about in this paper.
• In Allman (1999), we can read the most up to date discussion of brain and nervous
system evolution. We have been inspired by this book and there are good pointers
to additional literature in its reference list.
• Potts (2004) discusses how paleoenvironments may have shaped the evolution of
cognition in the great apes while the great ape cognitive systems themselves are
outlined in Russon (2004). Further articles can be found in the collection (Russon
and Begun 2004).
• Johnson-Frey (2004) studies the neural circuitry that may subserve complex tool
use in humans.
• The possible connection between modularity of function and the rise of cognition
is explored in Coltheart (1999) and in evolutionary terms in Redies and Puelles
(2001).
• The appearance of artistic expression in the cave paintings of France ca 25,000
BC (and other clues) has suggested to some that neural reorganization occurred
around that time. Other scientists are not so sure. Nevertheless, it is a clue to the
relationship between brain structure and higher level processing. Brain endocasts
give us clues about the location of sulci on the surface of the brain which allows
us to compare the sulci of fossils to those of primates and modern man and look
for differences. An interesting article about paleolithic art and its connection to
modern humans is presented in Conkey (1999). Holloway (1999) analyzes such
sulci location differences in his review of human brain evolution.
Trying to learn about how cognition might have arisen is made even more difficult
when the nervous system studied is as complex as the human brain. There is growing
evidence that simpler nervous systems appear to be capable of tasks that would easily
be considered as cognitive if seen in a higher vertebrate or primate. These simpler
systems occur in several insects and spiders. There appears to be some evidence
that parsing and fusing disparate sensory signals into higher level precepts begins
to build a cognitive system that at some point crosses a threshold to become aware.
This advanced behavior appears to develop primarily in predators that must assess
528 23 Background Reading
feelings in animals. Also, the whole idea of learning in animals is laid out in Zentall
et al. (2008). For specific animals, we can turn to the following references:
– A complete treatment of arthropod brains is given in Strausfeld (2012). Navi-
gational issues are explored in Webb et al. (2004).
– the octopus brain is analyzed in Young (1971) which is out of print but a wonder-
ful reference from which we can pull out neural circuitry for cephalopod models.
The squid brain’s development is outlined in Shigeno et al. (2001) and follow-
ing this developmental chain helps us contrast it with other neural development
pathways.
– Herrold et al. (2011) examines the nidopallium caudolaterale in birds which
might be an analogue of a type of cortex in humans. Hence, associations built
from environmental signals can be done in multiple ways which gives insight into
general cognitive architectures. A review of tool use in birds given in Lefebvre
et al. (2002) is also very helpful. In Petkov and Jarvis (2012), a very helpful
comparison on bird and primate language origins is presented which is also a way
to study how different systems are used to solve similar problems. This paper
should be read in conjunction with another survey on human language evolution
given in Berwick et al. (2012). Finally, the evolution of avian intelligence is
studied in Emery (2006).
– Honeybee neural architectures are studied in Menzel (2012) as a model of cogni-
tion. The information processing the honeybee mushroom body does is probed
in the papers (Haehnel and Menzel 2010b), (Szyszka et al. 2008) and (Sjöholm
et al. 2005). By studying these papers, we can see different ways to implement
cortical structure. A more general treatment of insect mushroom bodies is given
in Farris and Sinakevitch (2003). Honeybee memory formation is discussed in
the two papers (Hadar and Menzel 2010a) and (Roussel et al. 2010).
– The generation of behavior in a tadpole is worked out in detail in Roberts et al.
(2010) which helps us understand how low level neuronal information can give
rise to high level behavior.
– The fruit fly is capable of very complex cognitive functions which we are just
beginning to understand better. An overview of what the fruit fly is capable of is
found in Greenspan and van Swinderen (2004) and it makes for good reading.
Again, think about generalizations! A nice survey of social learning in insects
in general and what that might mean about their neural organizations is given
in Giurfa (2012) which is also very interesting.
– The ctenophore Mnemiopsis leidyi has recently had its genome sequenced and
we now know its neural architecture was developed in a different ways from
other animals. Hence, it also gives us new design principles to build functional
small brain architectures. Read Ryan et al. (2013) and Moroz et al. (2014)
carefully and take notes.
– Commonalities in the development of the nerve chord in crustaceans is discussed
in Harzsch (2003). Comparisons to what happens in arthropods are made which
allows generalization.
530 23 Background Reading
Some general ideas about brain evolution can also be illuminated by looking at
other animals. We note that these issues are explained carefully in Williams and
Holland (1998), Marakami et al. (2005) and Sprecher and Reichert (2003).
23.7 Software
Out of the many books that are available for self-study in all of the areas above,
some have proved to be invaluable, while others have been much less helpful. The
following annotated list consists of the real gems. To learn to program effectively in
an object oriented way in Python, it is helpful to know how to program in a procedural
language such as C. Then, learning how to program objects within the constraints of
the class syntax of C++ is very useful. This is the route we took in learning how to
program in an object oriented way. The final step is to learn how to use a scripting
glue language such as Python to build application software. Finally, don’t be put off
by the publication date of these resources! Many resources are timeless.
C++: The following books need to be on your shelf. Lippman will get you started,
but you’ll also need Deitel and Deitel and Olshevsky and Ponomarev for nuance.
1. C++ Primer, Lippman (1991). This book is the most basic resource for this
area. While very complete, it has shortcomings; for example, it’s discussion
of call by reference is very unclear and its treatment of dynamic binding in its
chapters on OOD is also murky. Nevertheless, it is a good basic introduction.
It’s biggest problem for us is that all of its examples are so simple (yes, even
the zoo class is just too simple to give us much insight).
2. C++: How to Program, Deitel and Deitel (1994). We have found this book
to be of great value. It intermingles excellent C++ coverage with ongoing
object oriented design (OOD)material. It is full of practical advice on software
engineering aspects of OOD design.
3. The Revolutionary Guide to OOP Using C++, Olshevsky and Ponomarev
(1994). This book has a wonderful discussion of call by reference and equally
good material on dynamic binding.
4. Compiler Design, Wilhem and Maurer (1995). This book has already been
mentioned in the text as the source of technical information on how an object-
oriented compiler is built. This is an essential resource.
Python: There are two wonderful resources for using Python in computation which
are Langtangen (2010, 2012). You can start learning about this now as the tran-
sition from using MatLab to using Python is fairly easy. The comments below on
object oriented programming are also relevant here, so even though many of the
object oriented texts are focusing on C++, don’t let that put you off.
Erlang: Erlang is a great choice for our eventual neural simulations as it will allow
us to do lesion studies. Of course, nodal computation in Erlang requires we find
good approximations to realistic neural computation, but that can be done as we
have discussed in this book. Two good references to this language are Armstrong
(2013) and Logan et al. (2011). However, the best way to get started is to read
(Hébert 2013). We encourage you to start thinking about this way of coding as
well as we will start using it in later volumes.
Haskell: Haskell is another language like Erlang which has great potential for writ-
ing lesion simulations. You should look at Lipovac̆a (2011) to get started here.
532 23 Background Reading
• In the Ph.D. thesis of Vogt (2000), we see studies and experiments on how to
get collections of robots to come up with a language all on their own and to
then to communicate using this language. This is extremely fascinating and the
mechanisms that Vogt and his colleagues use to implement both software and
hardware so as to gain behavioral plasticity have great promise.
• The view of situated cognition in Clancey (1997). In this book, Clancey discusses
at great length what he considers to be appropriate ways to develop robots that will
have the plasticity we need to perform interesting things. There is a lot of food for
thought here.
References
M. Conkey, A history of the interpretation of European ‘paleolithic art’: magic, mythogram, and
metaphors for modernity, in Handbook of Human Symbolic Evolution, ed. by A. Lock, C. Peters
(Blackwell Publishers, Massachusetts, 1999)
E. Davidson, Genomic Regulatory Systems: Development and Evolution (Academic, San Diego,
2001a)
E. Davidson, Inside the cis-regulatory module: control logic, and how regulatory environment is
transduced into spatial patterns of gene expression, Genomic Regulatory Systems: Development
and Evolution (Academic, San Diego, 2001b), pp. 26–63
T. Deacon, Rethinking mammalian brain evolution. Am. Zool. 30, 629–705 (1990)
H. Deitel, P. Deitel, C ++ : How to Program (Prentice Hall, Upper Saddle River, 1994)
M. Diamond, A. Scheibel, L. Elson, The Human Brain Coloring Book (Barnes and Noble Books,
New York, 1985)
D. Donaldson, Parsing brain activity with fMRI and mixed designs: what kind of a state is neu-
roimaging in? Trends Neurosci. 27, 442–444 (2004)
D. Edelman, B. Baars, A. Seth, Identifying hallmarks of consciousness in non-mammalian species.
Conscious. Cogn. 14, 169–187 (2005)
N. Emery, Cognitive ornithology: the evolution of avian intelligence. Philos. Trans. R. Soc. B 361,
23–43 (2006)
J. Ewert, Motion perception shapes the visual world of the amphibians, in Complex Worlds from
Simpler Nervous Systems, ed. by F. Prete (A Bradford Book, MIT Press, Cambridge, 2004), pp.
117–160
S. Farris, I. Sinakevitch, Development and evolution of the insect mushroom bodies: towards the
understanding of conserved developmental mechanisms in a higher brain center. Arthropod Struct.
Dev. 32, 79–101 (2003)
E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns: Elements of Reusable Object-
Oriented Software (Addison-Wesley, Reading, 1995)
M. Giurfa, Social learning in insects: a higher-order capacity? Front. Behav. Neurosci. 6, 1–3 (2012).
(Article 57)
B. Gomperts, P. Tatham, I. Kramer, Signal Transduction (Academic, San Diego, 2002a)
B. Gomperts, P. Tatham, I. Kramer, GTP-binding proteins and signal transduction, Signal Trans-
duction (Academic, San Diego, 2002b), pp. 71–106
R. Greenspan, B. van Swinderen, Cognitive consonance: complex brain functions in the fruit fly
and its relatives. Trends Neurosci. 27(12), 707–711 (2004)
R. Hadar, R. Menzel, Memory formation in reversal learning of the honeybee. Front. Behav. Neu-
rosci. 4, 1–11 (2010a). (Article 4)
M. Haehnel, R. Menzel, Sensory representation and learning-related plasticity in mushroom body
extrinsic feedback neurons of the protocerebral tract. Front. Syst. Neurosci. 4, 1–13 (2010b).
(Article 161)
D. Hardie, Biochemical Messengers: Hormones, Neurotransmitters and Growth Factors (Chapman
& Hall, London, 1991)
D. Harland, R. Jackson, Portia perceptions: the umwelt of an araneophagic jumping spider, in
Complex Worlds from Simpler Nervous Systems, ed. by F. Prete (A Bradford Book, MIT Press,
Cambridge, 2004), pp. 5–40
S. Harzsch, Ontogeny of the ventral nerve chord in malacostracan crustaceans: a common plan for
neuronal development in Crustacea, Hexapoda and other Arthropoda? Arthropod Struct. Dev. 32,
17–37 (2003)
M. Hauser, The Evolution of Communication (A Bradford Book, MIT Press, Cambridge, 1996)
M. Hauser, Moral Minds: How Nature Designed Our Universal Sense of Right and Wrong (Harper-
Collins, New York, 2006)
F. Hébert, Learn You Some Erlang for Great Good (No Starch Press, San Francisco, 2013)
C. Herrold, N. Palomero-Gallagher, B. Hellman, S. Kröner, C. Theiss, O. Güntürkün, K. Zilles,
The receptor architecture of the pigeons’ nidopallium caudolaterale: an avian analogue to the
mammalian prefrontal cortex. Brain Struct. Funct. 216, 239–254 (2011)
References 535
R. Holloway, Evolution of the human brain, in Handbook of Human Symbolic Evolution, ed. by A.
Lock, C. Peters (Blackwell Publishers, Massachusetts, 1999), pp. 74–116
S. Johnson-Frey, The neural basis of complex tool use in humans. Trends Cogn. Sci. 8(2), 71–78
(2004)
K. Kral, F. Prete, In the mind of a hunter: the visual world of the praying mantis, in Complex Worlds
from Simpler Nervous Systems, ed. by F. Prete (A Bradford Book, MIT Press, Cambridge, 2004),
pp. 75–116
K. Koch, J. Davis (eds.), Large Scale Neuronal Theories of the Brain (A Bradford Book, MIT Press,
Cambridge, 1994)
D. Kornack, Neurogenesis and the evolution of cortical diversity: mode, tempo, and partitioning
during development and persistence in adulthood. Brain, Behav. Evol. 55(6), 336–344 (2000)
L. Krubitzer, K. Huffman, A realization of the neocortex in mammals: genetic and epigenetic
contributions to the phenotype. Brain, Behav. Evol. 55(6), 322–335 (2000)
H. Langtangen, Python Scripting for Computational Science (Springer, New York, 2010)
H. Langtangen, A Primer of Scientific Programming with Python (Springer, New York, 2012)
L. Lefebvre, N. Nicolakakis, D. Boire, Tools and brains in birds. Behaviour 139, 939–973 (2002)
M. Lipovac̆a, Learn You a Haskell for Great Good (No Starch Press, San Francisco, 2011)
S. Lippman, C ++ Primer, 2nd edn. (Addison-Wesley, Reading, 1991)
M. Logan, E. Merritt, R. Carlsson, Erlang and OTP in Actions (Manning, Stamford, 2011)
Y. Marakami, K. Uchida, F. Rijli, S. Kuratani, Evolution of the brain developmental plan: insights
from agnathans. Dev. Biol. 280, 249–259 (2005)
R. Martin, Designing Object-Oriented C ++ Applications Using the Booch Method (Prentice Hall,
Englewood Cliffs, 1995)
R. Menzel, The honeybee as a model for understanding the basis of cognition. Nat. Rev.: Neurosci.
13, 758–768 (2012)
G. Miklos, Molecules and cognition: the latterday lessons of levels, language and lac: evolutionary
overview of brain structure and function in some vertebrates and invertebrates. J. Neurobiol.
24(6), 842–890 (1993)
L. Moroz, K. Kocot, M. Citrarella, S. Dosung, T. Norekian, I. Povolotskaya, A. Grigorenko, C.
Dailey, E. Berezikov, K. Buckely, A. Ptitsyn, D. Reshetov, K. Mukherjee, T. Moroz, Y. Bobkova,
F. Yu, V. Kapitonov, J. Jurka, Y. Bobkov, J. Swore, D. Girado, A. Fodor, F. Gusev, R. Sanford,
R. Bruders, E. Kittler, C. Mills, J. Rast, R. Derelle, V. Solovyev, F. Kondrashov, B. Swalla, J.
Sweedler, E. Rogaev, K. Halancych, A. Kohn, The ctenophore genome and the evolutionary
origins of neural systems. Nature 510, 109–120 (2014)
J. Nolte, The Human Brain: An Introduction to Its Functional Anatomy (Mosby, A Division of
Elsevier Science, St. Louis, 2002)
V. Olshevsky, A. Ponomarev, The Revolutionary Guide to OOP Using C ++ (WROX Publishers,
Birmingham, 1994)
J. Panksepp, Affective consciousness: core emotional feelings in animals and humans. Conscious.
Cogn. 14, 30–80 (2005)
J. Peterson, A White Paper on Neural Object Design: Preliminaries. Department of Mathemati-
cal Sciences (1995), Revised 1998, Revised 1999, Revised 2001. http://www.ces.clemson.edu/
~petersj/NeuralCodes/NeuralObjects3
J. Peterson, Calculus for Cognitive Scientists: Derivatives, Integration and Modeling, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015a in press)
J. Peterson, Calculus for Cognitive Scientists: Higher Order Models and Their Analysis, Springer
Series on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte
Ltd., Singapore, 2015b in press)
J. Peterson, Calculus for Cognitive Scientists: Partial Differential Equation Models, Springer Series
on Cognitive Science and Technology (Springer Science+Business Media Singapore Pte Ltd.,
Singapore, 2015c in press)
536 23 Background Reading
C. Petkov, E. Jarvis, Birds, primates, and spoken language origins: behavioral phenotypes and
neurobiological substrates. Front. Behav. Neurosci. 4, 1–24 (2012). (Article 12)
J. Pinel, M. Edwards, A Colorful Introduction to the Anatomy of the Human Brain: A Brain and
Psychology Coloring Book (Pearson, New York, 2008)
R. Potts, Paleoenvironments and the evolution of adaptability in great apes, in The Evolution of
Thought: Evolutionary Origins of Great Ape Intelligence, ed. by A. Russon, D. Begun (Cambridge
University Press, Cambridge, 2004), pp. 237–259
W. Pree, Design Patterns for Object-Oriented Software Development (ACM Press Books, Addison-
Wesley, New York, 1995)
F. Prete (ed.), Complex Worlds from Simpler Nervous Systems (A Bradford Book, MIT Press,
Cambridge, 2004)
F. Pulvermüller, Words in the brain’s language. Behav. Brain Sci. 22, 253–336 (1999)
L. Radinsky, The Evolution of Vertebrate Design (The University of Chicago Press, Chicago, 1987)
C. Redies, L. Puelles, Modularity in vertebrate brain development and evolution. Bioessays 23,
1100–1111 (2001)
C. Redies, L. Puelles, Central nervous system development: from embryonic modules to functional
modules, in Modularity in Development and Evolution, ed. by G. Schlosser, G. Wagner (University
of Chicago Press, Chicago, 2004), pp. 154–186
Z. Reznikova, Animal Intelligence: From Individual to Social Cognition (Cambridge University
Press, Cambridge, 2007)
A. Roberts, W. Li, S. Soffe, How neurons generate behavior in a hatchling amphibian tadpole: an
outline. Front. Behav. Neurosci. 4, 1–11 (2010). Article 16
E. Roussel, J. Sandoz, M. Giurfa, Searching for learning-dependent changes in the antennal lobe:
simultaneous recording of neural activity and aversive olfactory learning in honeybees. Front.
Behav. Neurosci. 4, 1–11 (2010). (Article 155)
A. Russon, Great ape cognitive systems, in The Evolution of Thought: Evolutionary Origins of
Great Ape Intelligence, ed. by A. Russon, D. Begun (Cambridge University Press, Cambridge,
2004), pp. 76–100
A. Russon, D. Begun (eds.), The Evolution of Thought: Evolutionary Origins of Great Ape Intelli-
gence (Cambridge University Press, Cambridge, 2004)
J. Ryan, K. Pang, C. Schnitzler, A. Nguyen, R. Moreland, D. Simmons, B. Koch, W. Francis, P.
Havlak, S. Smith, N. Putnam, S. Haddock, C. Dunn, T. Wolfsberg, J. Mullikin, M. Martindale, A.
Baxevanis, The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type
evolution. Science 342(6164), 1242592-1–12425920-8 (2013)
R. Saxe, S. Carey, N. Kanwisher, Understanding other minds: linking developmental psychology
and functional neuroimaging. Annu. Rev. Psychol. 55, 87–124 (2004). (Annual Reviews)
G. Schlosser, G. Wagner (eds.), Modularity in Development and Evolution (University of Chicago
Press, Chicago, 2004)
A.K. Seth, Criteria for consciousness in humans and other mammals. Conscious. Cogn. 14, 119–139
(2005)
S. Shettleworth, Cognition, Evolution, and Behavior (Oxford University Press, Oxford, 2010)
S. Shigeno, H. Kidokoro, K. Tsuchiya, S. Segawa, Development of the brain in the oegopsid squid,
Todarodes pacificus: an atlas from hatchling to juvenile. Zool. Sci. 18, 1081–1096 (2001)
M. Sjöholm, I. Sinakevitch, R. Ignell, N. Strausfeld, B. Hansson, Organization of Kenyon cells in
subdivisions of the mushroom bodies of a lepidopteran insect. J. Comp. Neurol. 491, 290–304
(2005)
J. Soukup, Taming C ++ : Pattern Classes and Persistence for Large Projects (Addison-Wesley,
Reading, 1994)
S. Sprecher, H. Reichert, The urbilaterian brain: developmental insights into the evolutionary origin
of the brain in insects and vertebrates. Arthropod Struct. Dev. 32, 141–156 (2003)
N. Strausfeld, Arthropod Brains: Evolution, Functional Elegans, and Historical Significance (The
Belknap Press of Harvard University Press, Massachusetts, 2012)
References 537
P. Szyszka, A. Galkin, R. Menzel, Associative and non-associative plasticity in Kenyon cells of the
honeybee mushroom body. Front. Syst. Neurosci. 2, 1–10 (2008). (Article 3)
P. Vogt, Lexicon Grounding on Mobile Robots. Ph.D. thesis, Vrije Universiteit, Brussel, Laborato-
rium voor Artificiele Intelligie (2000)
E. Wasserman, T. Zentall (eds.), Comparative Cognition: Experimental Explorations of Animal
Intelligence (Oxford University Press, Oxford, 2006)
B. Webb, R. Harrison, M. Willis, Sensorimotor control of navigation in arthropod and artificial
systems. Arthropod Struct. Dev. 33, 301–329 (2004)
R. Wilhem, D. Maurer, Compiler Design (Addison-Wesley, Reading, 1995)
A. Wilkins, The Evolution of Developmental Pathways (Sinauer Associates, Sunderland, 2002a)
A. Wilkins, Genetic pathways and networks in development, The Evolution of Developmental
Pathways (Sinauer Associates, Sunderland, 2002b), pp. 99–126
N. Williams, P. Holland, Molecular evolution of the brain of chordates. Brain, Behav. Evol. 52,
177–185 (1998)
J. Young, The Anatomy of the Nervous System of Octopus Vulgaris (Oxford at the Clarendon Press,
Oxford, 1971)
T. Zentall, E. Wasserman, O. Lazareva, R. Thompson, M. Rattermann, Concept learning in animals.
Comp. Cogn. Behav. Rev. 3, 13–45 (2008)
S. Zhang, M. Srinivasan, Exploration of cognitive capacity in honeybees: higher functions emerge
from small brain, in Complex Worlds from Simpler Nervous Systems, ed. by F. Prete (A Bradford
Book, MIT Press, Cambridge, 2004), pp. 41–74
Glossary
Address based graph models The basic idea here is to take the usual graphs,
edges and vertices classes and add additional information to them. Each node
now has a global node number and a six dimensional address. The usual methods
for manipulating these classes are then available. Using these tools, it is easier to
build complicated graphs one subgraph at a time. For example, it is straightforward
to build cortical cans out of OCOS, FFP and Two/Three subcircuits. The OCOS
will have addresses [0; 0; 0; 0; 0; 1−7] as there are 7 neurons. The FFP is simpler
with addresses [0; 0; 0; 0; 0; 1 − 2] as there are just 2 neurons, The six neuron
Two/Three circuit will have 6 addresses, [0; 0; 0; 0; 0; 1 − 6]. To distinguish
these neurons in different circuits from one another, we add a unique integer in
the fifth component. The addresses are now OCOS ([0; 0; 0; 0; 1; 1 − 7]), OCOS
([0; 0; 0; 0; 2; 1−2]) and Two/Three ([0; 0; 0; 0; 3; 1−6]), So a single can would
look like what is shown below
We are not showing the edges here, for convenience. We can then assemble cans
into columns by stacking them vertically. There are many details and you should
read Chap. 19 carefully. A typical small brain model could then consist of modules
with the following addresses, p. 417:
Input [1; ·; ·; ·; ·; ·]
Left Brain [2; ·; ·; ·; ·; ·]
Corpus Callosum [3; ·; ·; ·; ·; ·]
Right Brain [4; ·; ·; ·; ·; ·]
Thalamus [5; ·; ·; ·; ·; ·]
MidBrain [6; ·; ·; ·; ·; ·]
Cerebellum [7; ·; ·; ·; ·; ·]
Output [8; ·; ·; ·; ·; ·]
where the model of the tail of the action potential is of the form Vm (t) =
V3 + (V4 − V3 ) tanh(g(t − t3 )). Note that Vm (t3 ) = (V4 − V3 ) g and so if we
were using real voltage data, we would approximate Vm (t3 ) by a standard finite
difference. In Sect. 9.3, we derive equations telling us how the various attributes
of an action potential are altered when there are changes in these 11 parameters.
This information can then be used to link second messenger effects directly to
BFV changes, p. 141.
Biological Information Processing Biological systems process information that
uses a computational node paradigm. Inputs are collected in a network of incoming
edges called the dendritic arbor, combined in nonlinear ways in the computational
unit called the node and then an output is generated. This processing can be
modeled in many ways and in the text, we focus on some of them such as matrix
feed forward networks and graphs of nodes which we sometimes call chained
networks, p. 4.
Brain model For our purposes, we will consider a brain model to consist of the
cerebrum, the cerebellum and the brain stem. Finer subdivisions are then shown in
the table below where some structures are labeled with a corresponding number
for later reference in figures shown in Chap. 5. The numbering scheme shown
below helps us locate brain structures deep in the brain that can only be seen
by taking slices. These numbers thus correspond to the brain structures shown
in Fig. 5.4 (modules that can be seen on the surface) and the brain slices of
Figs. 5.5a, b, and 5.6a, b. A useful model of the processing necessary to combine
disparate sensory information into higher level concepts is clearly built on models
of cortical processing.
Glossary 541
There are many more details you should look over in Chap. 5 including more infor-
mation about cortical models, p. 67.
Calcium trigger event Some triggers initiate a spike in calcium current which
enters the cell and alters the equilibrium between free calcium in the cell and
calcium stored in a variety of buffering complexes. This provides useful ways of
regulating many biological processes. A second messenger system often involves
Ca++ ion movement in and out of the cell. The amount of free Ca++ ion in
the cell is controlled by complicated mechanisms, but some is stored in buffer
complexes. The release of calcium ion from these buffers plays a big role in
cellular regulatory processes and which protein P(T1 ) is actually created from a
trigger T0 . Calcium is bound to many proteins and other molecules in the cytosol.
This binding is fast compared to calcium release and diffusion. Let’s assume
there are M different calcium binding sites; in effect, there are different calcium
species. We label these sites with the index j. Each has a binding rate constant
kj+ and a disassociation rate constant kj− . We assume each of these binding sites
is homogeneously distributed throughout the cytosol with concentration Bj . We
let the concentration of free binding sites for species j be denoted by bj (t, x) and
the concentration of occupied binding sites is cj (t, x); where t is our time variable
and x is the spatial variable. We let u(t, x) be the concentration of free calcium
ion in the cytosol at (t, x). The amount of release/uptake could be a nonlinear
function of the concentration of calcium ion. Hence, we model this effect with
the nonlinear mapping f0 (u). The diffusion dynamics are then
M
∂u ∂2u
= f0 (u) + kj− cj − kj+ (Bj − cj )u + D0
∂t j=1
∂x 2
542 Glossary
∂cj ∂ 2 cj
= −kj− cj + kj+ (Bj − cj )u + Dj
∂t ∂x 2
where D0 is diffusion coefficient for free calcium ion. There are also boundary
conditions which we will let you read about in the discussions in Chap. 7. The
concentration of total calcium is the sum of the free and the bound. We denote
this by w(t, x) and we can show
M M
∂w ∂2w ∂ 2 cj
= f0 w − cj + D0 + (Dj − D0 )
∂t j=1
∂x 2 j=1
∂x 2
M M
∂w ∂2w ∂ 2 cj
= f0 w − cj + D0 + (Dj − D0 )
∂t j=1
∂x 2 j=1
∂x 2
Using several assumptions about how different rates of calcium binding compare
we can then reduce this to a new dynamic model for u given by
2
∂u f0 (u) D0 + Mj=1 Dj γj ∂ u
= +
∂t ∂x 2
∂u f0 (u) ∂2u
= + D̂ 2
∂t ∂x
This is the model that allows us to estimate the effect of a calcium current spike
initiated by a second messenger event which is important in understanding how
to approximate nodal computations in simulations, p. 107.
Cellular trigger The full event chain for a cellular trigger (you will have to look
up the meaning of these terms in Chap. 6) is
where P(T1 ) indicates the protein whose construction is initiated by the trigger
T0 . Without the trigger, we see there are a variety of ways transcription can be
stopped:
• T1 does not exist in a free state; instead, it is always bound into the complex
T1 /T1∼ and hence can’t be activated until the T1∼ is removed.
• Any of the steps required to remove T1∼ can be blocked effectively killing
transcription:
– phosphorylation of T1∼ into T1∼ P is needed so that tagging can occur. So
anything that blocks the phosphorylation step will also block transcription.
– Anything that blocks the tagging of the phosphorylated T1∼ P will thus block
transcription.
– Anything that stops the removal mechanism fSQVn will also block transcrip-
tion.
The steps above can be used therefore to further regulate the transcription of T1
into the protein P(T1 ). Let T0 , T0 and T0 be inhibitors of the steps above. These
inhibitory proteins can themselves be regulated via triggers through mechanisms
just like the ones we are discussing. In fact, P(T1 ) could itself serve as an
inhibitory trigger—i.e. as any one of the inhibitors T0 , T0 and T0 . Our theoretical
pathway is then
where the step i, step ii and step iii can be inhibited as shown below:
These event chains can then be analyzed dynamically and using equilibrium
analysis, approximations to the effects of a trigger can be derived. The full
details of that discussion are complex and are given carefully in Chap. 6. The
protein P(T1 ) can be an important one for the generation of action potentials
544 Glossary
such a sodium gates. Hence, this is a model of second messenger activity that
can alter the hardware of the cell, p. 85.
CFFN back propagation The CFFN energy function is minimized using gradient
descent just as we do in the MFFN case, but the derivation is different as the
structure of the problem is in a graph form rather than matrices and vectors. The
discussion is technical and we refer you to Sect. 17.2 for the details of how to find
the required partial derivatives recursively, p. 319.
CFFN training problem The output of the CFFN is therefore a vector in nO
defined by
H(x) = Y i | i ∈ V
and we see that H : nI → nO is a highly nonlinear function that is built out
of chains of feedforward nonlinearities. The parameters that control the value
of H(x) are the forward links, the offsets and the gains for each neuron. The
cardinality of these parameter sets are given by
N−1
ns = | F(i) |
i=0
no = N
ng = N
where |F(i)| denotes the size of the forward link set for the ith neuron; ns denotes
the number of synaptic links; no , the number of offsets; and ng , the number of
gains. Now let I = {xα ∈ RnI : 0 ≤ α ≤ S − 1} and D = {Dα ∈ RnO : 0 ≤ α ≤
S − 1} be two given sets of data of size S > 0. The set I is the set of input
exemplars and the set D is the set of outputs that are associated with exemplars.
Together, the sets I and D comprise what is known as the training set. Also,
from now on, as before in the MFFN case, the subscript notation α indicates the
dependence of various variables on the αth exemplar in the sets I and D. The
training problem is then to choose the CFFN parameters to minimize an energy
function, E, given by
S−1 nO −1
E = 0.5 f (Yαvi − Dαi ),
α=0 i=0
S−1
= 0.5 f (Yαi − Dαdi )
α=0 i∈V
where f is a nonnegative function of the error term Yαi −Dαdi ; e.g. using the function
f (x) = x 2 gives the standard L 2 or least squares energy function, p. 318.
Glossary 545
We will let nI and nO denote the cardinality of U and V respectively. The remaining
neurons in the chain which have no external role will be called hidden neurons
with dimension nH . Note that nH + | U | = N. Note that it is possible for an input
neuron to be an output neuron; hence U and V need not be disjoint sets. The chain
is thus divided by function into three possibly overlapping types of processing
elements: nI input neurons, nO output neurons and nH internal or hidden neurons.
We will let the set of postsynaptic neurons for neuron i be denoted by F(i), the
set of forward links for neuron i. Note also that each neuron can be viewed as a
postsynaptic neruon with a set of presynaptic neurons feeding into it: thus, each
neuron i has associated with it a set of backward links which will be denoted by
B(i). The input of a typical postsynaptic neuron therefore requires summing over
the backward link set of the postsynaptic neuron in the following way:
where the term x is the external input term which is only used if the post neuron
is an input neuron. Note the CFFN notation is much cleaner than the MFFN and
is more general as we can have links in the CFFN model we just cannot handle
in the MFFN, p. 315.
546 Glossary
Fourier Transform Given a function g defined on the y axis, we define the Fourier
Transform of g to be
∞
1
F (g) = √ g(y) e−jξy dy
2π −∞
where j denotes the square root of minus 1 and the exponential term is as usual
This integral is well defined if g is what is called square integrable which roughly
means we can get a finite value for the integral of g 2 over the y axis, p. 41.
Graph model A model consisting of nodes called vertices and edges. In MatLab,
there are three classes we define: the vertices class, the edges class and the graphs
class using the particular syntax MatLab requires. This is fairly technical, so you
need to look carefully at the details in Sect. 18.1. Roughly speaking, we first create
edges and vertices objects and use them to build the graphs objects. To build an
OCOS graph, the commands
V = [1;2;3;4;5;6;7;8]; v = vertices(V);
define the OCOS nodes; the edges are defined by
E = [1;2],[2;3],[2;4],[2;5],[3;6],[4;7],[5;8],[2;7],[1;7];
and
e = edges(E); and OCOS=graphs(v,e); define the OCOS graph. There
are many other methods associated with these classes such as adding vertices and
edges, adding two graphs together and so forth. The details are in Sect. 18.1. The
big thing here is that when two graphs are merged, their individual node and edge
numbers are redone and all information about where they originally came from
Glossary 547
Hebbian update We can also adjust parameters in a graph model using the classic
idea of a Hebbian update which would be implemented as shown below: choose
a tolerance and at each node in the backward set of the post neuron i, update
the edge value by the factor ζ if the product of the post neuron node value fi and
the edge value Ej→i exceeds . The update algorithm then consists of the paired
operations: sweep through the DG to do an evaluation and then use both Hebbian
updates and the graph flow equations to adjust the edge values, p. 499.
for(i = 0; i < N ; i + +) {
for (j ∈ B(i)) {
yp = fi Ej→i
if (yp > )
Ej→i = ζ Ej→i
}
}
The new function L (x) is defined for some domain of the new variable β. The
variable β’s domain is called the frequency domain and in general, the values of
β where the transform is defined depend on the function x we are transforming.
Also, in order for the Laplace transform of the function x to work, x must not
grow to fast—roughly, x must decay like an exponential function with a negative
coefficient, p. 39.
Laplacian graph based information flow The standard cable equation for the
voltage v across a membrane is
∂v
λ2 ∇ 2 v − τ − v = −r λ2 k
∂t
548 Glossary
where the constant λ is the space constant, τ is the time constant and r is a
geometry independent constant. The variable k is an input source. We assume the
flow of information through a graph model G is given by the graph based partial
differential equation
τ G ∂f 1
∇ G2 f − − 2 f = −rk.
λ2G ∂t λG
where f is the vector of node values for the graph and ∇ G2 f = KK f where K
is the incident matrix for the graph. Relabel the fraction τ G /λ2G by μ1 and the
constant 1/λ2G by μ2 where we drop the G label as it is understood from context.
Each computational graph model will have constants μ1 and μ2 associated with it
and if it is important to distinguish them, we can always add the labellings at that
time. This equation gives us the Laplacian graph based information flow model,
p. 279.
F : Rn0 → RnM+1
that has a special nonlinear structure. The nonlinearities in the FFN are contained
in the neurons which are typically modeled by the sigmoid functions,
typically evaluated at
x−o
y= ,
g
where x is the input, o is the offset and g is the gain, respectively, of each neuron.
We can also write the transfer functions in a more general fashion as follows:
x−o
σ(x, o, g) = 0.5 1.0 + tanh
g
yi0 = xi
Xi0 = (yi0 − Oi0 )/gi0
Yi0 = σi0 (Xi0 )
n −1
yi+1 = Tji Yj
j=1
So the output from the MFFN at the output layer M + 1 is given by p. 287.
nM+1 −1
yiM+1 = TjiM YjM
j=1
MFFN back propagation The backpropagation equations for a MFFN are simply
a recursive algorithm for computing the partial derivatives of the MFFN error or
energy function with respect to each of the tunable parameters. The indicial details
for a given training set are fairly intense, so we just refer you to the lengthy and
detailed discussion in Sect. 16.3, p. 290.
MFFN training problem With the usual MFFN notation, for any I ∈ Rn0 , we
know how to calculate the output from the MFFN. It is clearly a process that
flows forward from the input layer and it is very nonlinear in general even though
it is built out of relatively simple layers of nonlinearities. The parameters that
control the value of F(I) are the ni ni+1 coefficients of T i , 0 ≤ i ≤ M; the offsets
550 Glossary
M M+1
N= ni ni+1 + 2 ni .
i=0 i=0
Now let
I = Iα ∈ R : 0 ≤ α ≤ S − 1
n0
and
D = Dα ∈ R nM+1
:0≤α≤S−1
be two given sets of data of size S > 0. The set I is referred to as the set of
exemplars and the set D is the set of outputs that are associated with exemplars.
Together, the sets I and D comprise what is known as the training set. The training
problem is to choose the N network parameters, T 0 , . . . , T M , O0 , . . . , OM+1 ,
g 0 , . . . , g M+1 , such that we minimize,
S−1
E = 0.5 F(I α ) − Dα 2,
2
α=0
S−1 nM+1−1 2
= 0.5 M+1
Yαi − Dαi
α=0 i=0
where the subscript notation α indicates that the terms correspond to the αth
exemplar in the sets I and D, p. 290.
Sigma Three Transition Second messenger events have consequences that can
be written in terms of concatenated sigmoid transitions. In many cases of interest,
there are three such transformations and this is then called a σ3 transition. We use
the notation
Glossary 551
where the meanings of the various terms, if not self explanatory, can be found in
Chap. 8. Thus, the typical sodium maximum conductance gNa computation can
be written as p. 121.
r[T1 ]n
gNa (t, V ) = gNa
max
1 + eδNa h3 ([T0 ], [T0 ]b , gp ; r; [T1 ]n ; , ge ; s; [T1 ]n ; 0, gNa )
2
p q
MNa (t, V )HNa (t, V )
∂ 2 vm ∂vm
λ2c = vm + τm − ro λ2c ke
∂z 2 ∂t
Recall that ke is current per unit length. Next convert this equation into a diffusion
model by making the change of variables y = λzc and s = τtm . With these changes,
space will be measured in units of space constants and time in units of time
constants. We then define the a new voltage variable w by
w(s, y) = vm (τm t, λc z)
∂2w ∂w
=w+ − ro λ2c ke (τm s, λc y)
∂y2 ∂s
(s, y) = w(s, y) es
and we find
∂2 ∂
= − r0 λ2c τm ke (τm s, λc y) es
∂y 2 ∂s
552 Glossary
∂2 ∂
L =L − r0 λ2c L (Pnm (τm s, λc y) es )
∂y 2 ∂s
∂ 2 L ()
= β L () − (0, y) − r0 λ2c τm L (ke (τm s, λc y) es )
∂y2
(0, y) = 0, y = 0
vm (0, z) = 0, z = 0;
∂ 2 L ()
= β L () − r0 λ2c τm L (ke (τm s, λc y) es )
∂y2
∂ 2 L ()
F = β F (L ()) − r0 λ2c τm F (L (ke (τm s, λc y) es ))
∂y2
or
T () = F (L ())
we see we have
The rest of the argument is messy but straightforward. The current ke is replaced
by a sequence of current pulses Pnm and limit arguments are used to find the value
of T (ke (τm s, λc y) es ). The limiting solution for a impulse current I0 then satisfies
r0 λ c I0
−ξ 2 T () = β T () − r0 λc I0 =⇒ (ξ 2 + β)T () = √
2π
Glossary 553
We then apply the Laplace and Fourier Transform inverses in the right order to
find the solution
1 ((z−z0 )/λc )2
−
vm (t, z) = r0 λc I0 √ e 4((t−t0 )/τm ) e−(t−t0 )/τm )
4π ((t − t0 )/τm )
Note this solution does not have time and space separated!, p. 45.
Würfelspiel data matrices There is an 18th century historical idea called The
Musicalisches Würfelspiel. In the 1700’s, fragments of music could be rapidly
prototyped by using a matrix A of possibilities. It consists of P rows and three
columns. In the first column are placed the opening phrases or nouns; in the third
column, are placed the closing phrases or objects; and in the second column, are
placed the transitional phrases or verbs. Each phrase consisted of L notes and the
composer’s duty was to make sure that any opening, transitional and closing (or
noun, verb and object) was both viable and pleasing for the musical style that
the composer was attempting to achieve. The Würfelspiel matrix used in musical
composition has the general form below.
⎡ ⎤
Opening 0 Transition 0 Closing 0
⎢ Opening 1 Transition 1 Closing 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥
⎣ .. . . ⎦
Opening P-1 Transition P-1 Closing P-1
and the Würfelspiel matrix used in painting composition has the form
554 Glossary
⎡ ⎤
Background 0 Midground 0 Foreground 0
⎢ Background 1 Midground 1 Foreground 1 ⎥
⎢ ⎥
A = ⎢. .. .. ⎥
⎣ .. . . ⎦
Background P-1 Midground P-1 Foreground P-1
Note that there are P3 possible musical sentences that can be formed in this manner.
If each opening, transition and closing fragment is four beats long, we can build
P3 different twelve beat sentences. In a similar way, we can assemble P3 painting
compositions, p. 209.
Index
the inverse Laplace transform definition, Address based edges subsref, 423
40 address based graph constructor
graph, 429
address based graph constructor
M graph subsref, 429
MatLab address based incidence, 438
2-3-1 CFFN address based laplacian, 441
Evaluation, 330 Address based vertices, 420
Initialization, 329 Address based vertices subsref, 421
2-3-1 MFFN Address to global utility function:
Evaluation, 299 code Address2String, 434
Initializing and finding the error, 299 Address to Node lookup, 432
Initializing the nodefunctions, 298 Address2Global, 434
The error loop, 298 Address2Global New, 437
2-3-1 MFFN Example Address2String, 435
initializing the nodal functions, 298 Another way to convert an address
The energy or error loop, 298 to a global node number: code
Address Based Graphs Address2Global: A form of hash-
A Evaluation Loop With Offsets and ing, 434
Gains, 446 Backward Sets need both global and
A new incidence matrix calculation, address information, 441
436 BFsets, 441
A Simple Evaluation Loop, 445 Building the incidence matrix with
Accessing .l information, 428 Straight linear searches which are
Accessing .n information, 427 expensive, 433
Accessing .v information, 426 calling Address2Global, 437
add a node list address based vertices converting addresses to unique hash
addv, 422 values, 436
add a single edge address based edge Evaluation With Input Data, 451
add, 424 Find OCOS global node numbers,
add a single node with address based 427
vertices add, 422 Find the maximum number of nodes
add an edge list address based edges on each level: code GetAddress-
addv, 424 Maximums, 435
Adding a list of nodes to an existing FindGlobalAddresses, 435
graph, 431 Finding the maximum number of
Adding a mask or location to graph nodes per level, 435
edges, 430 General discussion, 417
Adding a mask or location to graph GetAddressMaximums, 435
nodes, 430 getting address slot maximums, 436
Adding a single edge to an existing Hebbian Training With an Error Sig-
graph, 431 nal, 452
Adding a single node to an existing HebbianUpdateErrorSignal, 455
graph, 431 Incidence Matrix, Backward and For-
address based addedge, 431 ward Data and Initialization, 450
address based addedgev, 431 Initialized the Neuron Data, 452
address based addlocationtoedges, Inside .n information, 427
430 Inside the .l information; node data,
address based addlocationtonodes, 428
430 Inside the .l information; the edge
address based addnode, 431 data for a node, 428
address based addnodev, 431 Inside the .l information; the in and
Address based edges, 423 out edge data for a node, 428
562 Index
Find the location of the absolute max- The updated energy code: energy2.m,
imum of a vector, 366 373
Finding backward and forward sets: vertices, 340
BFsets0, 356 vertices constructor, 340
finding eigenvalues and eigenvectors vertices example, 341
of the OCOS and FFP Laplacians, vertices subsref overloading, 341
350 Lagged CFFN
finding the incidence matrix for the Classification code: GetRecognition,
OCOS and FFP graph, 349 400
finding the Laplacians for the OCOS Converting the incidence matrix for a
and FFP graphs, 350 CFFN with feedback into a dot file,
Finding the scaled gradient, 406 385
generating OCOS.pdf via dot, 355 Example: 10 training steps, 399
generating the dot file from the inci- Example: 10 training steps gradient
dence matrix, 354 information, 399
Gradient Update Code, 366 Example: A simple evaluation and
Graph evaluation code: evalua- energy calculation, 398
tion2.m, 361 Example: Further setup details and
graphs constructor, 344 input and target sets, 397
graphs subsref overloading, 344 Example: Setting up the lagged
graphs: building the picture using dot, graph, 397
344 Example: Setting up the original
Laplacian for global graph, 347 graph with feedback, 396
Node function Initialization, 371 Example: Setup parameters and get
backward and forward information,
Returning forward edge information:
398
Changed BFsets code, 365
Finding the feedback edges, 387
Set up gradient vectors, 406
General comments on training the
sigmoid, 357
lagged architecture, 389
Simple incidence matrix for global
Incidence matrix for CFFN with feed-
graph, 347
back, 385
The altered training loop code:
New Gradient Update Code for
chainffntrain2.m, 376
Lagged Networks: Activating Line
the edge subsref overloading, 342 Search and Gradient Scaling, 402
The energy calculation: energy.m, New GradientUpdateLag skeleton
363 code, 402
the evaluation, 357 New Lagged Gradient Update Code:
the generated dot file for the OCOS GradientUpdateLag.m, 393
graph, 355 sample classification results, 401
the graphs class, 344 Setting up graph for the OCOS/FFP
The Hebbian code: HebbianUp- with column to column feedback,
dateSynapticValue, 358 387
the incidence to dot code: incToDot, Subtract the feedback edge and con-
354 struct the double copy, 389
The new evaluation function: evalua- testing the incidence matrix for a
tion2.m, 373 CFFN with feedback and its dot file,
the new training code, 411 386
The new update code, 409 The edges object subtract edges func-
The new update code: GradientUp- tion, 388
date2.m, 374 The graphs object subtract edges
The regular gradient descent step, function, 388
408 The lagged training code, 396
The scaled gradient descent step, 407 Thresholding code: Recognition, 400
564 Index
add the ROCOS nodes into the thala- music and painting data provide data
mus, 472 for validation of a brain model, 227
building the reverse OCOS module, musical and painting data constrain
471 the associative cortex and limbic
building with more than one ROCOS system model, 229
block, 472 musical and painting data constrain
Construct the ROCOS blocks, 472 the associative cortex model, 228
generating the dot file, 473 musical and painting data contains
Matrix FFN information about good composi-
A Simple sigmoid nodal processing func- tional design, 229
tion, 287 the cortex model, 228
A Simple sigmoid nodal processing func- Connectionist based Design
tion with offsets and gains, 288 alphabet discussions for the musical
Definition, 288 and painting data, 231
General Discussion, 287 generating streams of sentences, 237
Inner layer processing, 288 mapping process noun N feature vec-
Input layer processing, 288 tors into verb V feature vectors: the
Minimizing the MFFN energy uses gra- map gN V , 233
dient descent, 290 mapping process verb V feature vec-
Output layer processing, 289 tors into object O feature vectors:
partial computations in the layers before the map gV O , 233
the input layer, 292 Noun to Verb processing, 234
The ξ M+1 computations and the M + 1 Noun to Verb processing: acceptable
layer offset and gain partials, 292 letter choices in a noun, 234
The full partial computations for all lay- Noun to Verb processing: building
ers, 295 a map for acceptable noun letter
The gain and offset partials for layer M, choices using Hebbian training, 235
294 Noun to Verb processing: random
the offset and gain partials for the input assignment of next noun letter
layer, 295 choice from the possibilities, 236
The output layer partial computations: Noun to Verb processing: the idea of
weighting terms Tpq M ., 291
a noun generating map, NGM, 234
The recursive ξ M computations, 294 Noun to Verb processing: the pro-
The sum of squared errors energy func- cedure to create a valid noun
tion, 290 sequence, 235
The training problem, 290 Noun to Verb processing: the proce-
Three Layer Example dure to create a valid noun sequence
Hidden layer partial computations, generates the NGM, 236
297 Noun to Verb processing: the pro-
Input layer partial computations, 297 cedure to create a valid object
Output layer partial computations, sequence generates the OGM, 236
296 Noun to Verb processing: the proce-
Variables needed to describe MFFN dure to create a valid verb sequence
computation, 288 generates the VGM, 236
Modeling Compositional Design preprocessing a noun vector n into a
basic ground rules, 227 noun feature vector N, 232
a music or painting initiation gen- preprocessing a verb vector v into an
erates fMRI and skin conductance verb feature vector V , 233
information and a musical or paint- preprocessing an object vector o into
ing composition, 229 an object feature vector O, 233
an abstract model of limbic system raw sentence data into feature vec-
in biological information process- tors, 233
ing to follow, 228 sentence construction, 236
566 Index
the value of a dendrite - axon interaction redefined diffusion constant for cal-
with pools of local axons contributing, cium dynamics when calcium bind-
263 ing is fast, 111
Neurotransmitters storage pool binding details, 108
catecholamine class total calcium equation boundary con-
dopamine, 252 ditions, 110
epinephrine, 252 total calcium, bound and unbound,
norepinephrine, 252 dynamics, 109
share common core biochemical total the calcium binding is fast
structure, 251 assumption, 110
the catechol group, 251 Second Messengers
the catechol molecule with a ethyl Design Principles, 117
side chain, 252 conversion of [P(T1 )] into a change
the catecholamine synthesis pathway, in gNa
max as a sigmoidal process, 120
253 general pharmacological inputs, 130
Non separable Time dependent cable equa- general pharmacological inputs:
tion solutions allosteric modulation, 133
Using the idea of diffusion to give insight, general pharmacological inputs: four
19 transmembrane regions, 132
general pharmacological inputs: gen-
eral receptor families, 131
S general pharmacological inputs:
Second Messenger Diffusion Pathways seven transmembrane regions, 131
Ca++ ion movement, 107 general pharmacological inputs:
a trigger event initiates an increase in seven transmembrane regions and
a calcium buffer, 114 first messenger reshaping of bio-
calcium diffusion in the cytosol, 107 logical structure of post cell, 131
calcium dynamics boundary condi- general pharmacological inputs: the
tions when calcium binding is fast, agonist spectrum, 132
111 modeling neurotransmitter modula-
calcium dynamics boundary condi- tors, 133
tions when the binding rate is modeling neurotransmitter multiplier
smaller than the dissociation rate, equations, 134
113 modeling neurotransmitters could
calcium dynamics has new diffusion increase calcium currents, 135
constant when the binding rate is modeling port activity with sigmoidal
smaller than the dissociation rate, transitions, 119
113 second messenger Ca++ activity
calcium dynamics when calcium alters maximum ion conductances,
binding is fast, 111 129
calcium dynamics when the binding second messenger Ca++ activity
rate is smaller than the dissociation with feedback, 129
rate, 112 the Ca++ triggers, 125
concentration of occupied binding the Ca++ triggers and their feedback
sites depends on calcium concen- term estimates, 128
tration, 111 the Ca++ triggers as second messen-
first order approximations for a trig- gers, 128
ger event initiates an increase in a the Ca++ triggers effect on the
calcium buffer, 115 sodium conductance pathway, 125
free calcium ion dynamics, 107 the Ca++ triggers with spatial depen-
new calcium dynamics equation dence, 126
when the binding rate is smaller the dendrite - soma model and its
than the dissociation rate, 113 inputs, 135
568 Index
the full Ca++ triggers with spatial attributes attached to music for vari-
dependence equations, 126 ous emotional states, 191
the pathway to control changes in choosing the emotional musical
maximum conductance, 118 alphabet, 201
the second and third level sigmoidal comments of neutral data, 195
transformations as computational comments on angry data, 196
graphs, 123 comments on happy data, 196
the sodium conductance sigma3 tran- comments on sad data, 196
sitions, 121
deviations from pure tones, perfect
the sodium conductance modifica-
harmony convey emotion, 192
tion equation in compact form and
sigmoid processing concatenation, emotional communication in perfor-
121 mance, 193
the sodium conductance modifica- generating sad, angry and happy data,
tion equation in compact form with 191
feedback, 123 how guitarists express emotional
the third order sigmoidal transforma- attributes, 194
tions as computational graphs, 122 is the emotional content of music uni-
the trigger initiates a change in max- versal? Can someone from another
imum potassium conductance, 120 culture identify emotion in an unfa-
The trigger initiates production of miliar culture’s music?, 192
sodium gates, 117 phrase encodings using the alphabet,
Software Implementation Ideas 202
a core object could be a neurotransmitter, some angry phrases, 200
254 some happy phrases, 196
a possible software architecture blue- some sad phrases, 198
print, 258 the angry data, 199
coding synaptic interaction, 259 the design of the data for opening,
dendrite - axon interactions via an inter- transition and closing phrases, 195
mediary agent, 254 the effects of emotion on pitch and
discussions of self modifying objects: rhythm, 191
communication alters structure, 257 the happy data, 196
how is CA plasticity managed?, 254
the punctuation marks in the musical
neurotransmitters have both local and
grammar, 201
global scope issues, 255
the sad data, 197
processing by the architecture involves
combinatorial strategies, 258 timing patterns in the encoding of
some molecules have multiple functions emotion in music, 194
and hence, some software objects must grammatical clauses and phrases with
play that role also, 256 adjectives, 183
the architecture should be event based; music in terms of smaller blocks and their
i;e. asynchronous in nature, 257 role in overall function, 183
musical nouns, verbs and objects, 184
neutral Würfelspiel matrix design, 187
T correct neutral nouns are not random
The design of emotionally labeled musical choices but require a composer’s
data skill, 187
cadences are punctuation marks, 184 generated neutral musical fragments
connections to vocal melody and poetry, of 12 notes each, 189
185 neutral nouns, 187
emotional Würfelspiel matrix design the closing phrases, 188
adding octaves to the emotional musi- the complete Musicalisches Würfel-
cal alphabet, 214 spiel matrix, 189
Index 569
the construction of the neutral gram- encoding the neutral data, 212
mar gives clues about the order in painting edges, 213
which notes appear, 189 The assembled neutral paintings, 211
the middle phrases, 188 the neutral data matrix, 210
the idea of a musical grammar, 183 the Qait painting
the Würfelspiel matrix allows different abstract design details, 207
noun, verb and object combinations the Qait painting from background to
creating many different musical sen- foreground, 207
tences, 186 the sad Würfelspiel matrix
the Würfelspiel matrix uses opening, a typical sad painting assembled from
transitions and closings are the nouns, background, midground and fore-
verbs and objects in a musical sentence, ground images, 222
186 the assembled sad paintings, 222
The design of emotionally labeled painting the seadragon and Qait paintings
data the three element abstract design, 207
data sets to train the visual cortex, 205 the seadragon painting
developing a painting model: back- abstract design details, 206
ground first, then middle layer and the seadragon painting from background
finally the foreground details, 205 to foreground, 206
painting design Würfelspiel matrices
simplified paintings are assembled The music data matrix, 209
using background, then midground The painting data matrix, 209
and finally foreground, 208 The Fourier Transform
the emotional Würfelspiel matrix Applied to a derivative, 41
artists have written about their use of Definition, 41
emotion, 218 the inverse Fourier Transform definition,
capturing emotion in a painting, 216 42
curve quality and emotional using the inverse Fourier transform, 42
attributes, 217 The graph subsref overloading, 344
distinguishing beauty and ugliness, The Time Dependent Cable Solution
217 Solving directly using transforms, 45
emotion is evoked by differing qual- applying the T transform to the pulse
ities of the force perceived in a pen family, 50
or brush stroke, 217 applying the Fourier transform in
portraying face and body language, space, 50
218 applying the Laplace transform in
the use of expression by Matisse, 219 time, 49
the use of light and color, 217 choosing the current family normal-
the happy Würfelspiel matrix ization constant, 47
a typical happy painting assem- Converting the cable model into a dif-
bled from a given background, fusion model with a change of vari-
midground and foreground, 220 ables, 48
the assembled happy paintings, 220 Defining the T transform, 50
the happy data matrix, 220 induced voltage attenuates propor-
The mello painting tional to the square root of the cable
background and foreground discus- radius, 58
sion, 208 inverting the T transform, 51
the neutral Würfelspiel matrix limiting value of the T transform of
a painting encoding, 216 the pulse family, 51
an assembled neutral painting, 210 modeling an idealized current pulse,
connections between the musical 46
alphabet and the painting edge reinterpreting the cable model results
encoding, 215 in terms of families of charges, 54
570 Index
the family of current impulses, 46 the solution for the constant applied
the idealized solution to the T trans- current, 56
form model, 51
the solution to the cable equation that
The initial condition assumption, 49
was scaled using a variable trans-
the input is a constant current, 55
formation, 53
the new model in terms of the T
transform, 50 the solution to the original unscaled
the new scaled model to solve, 49 cable equation, 53