SOFT COMPUTING
Neural Networks
Introduction to ANS- adaline- BPN- Hopfield network- Boltzman machine- Self Organizing maps
1.1 Introduction to ANS
Neural Networks
Neural networks are highly interconnected processing elements or neurons. Adaptive Neural Networks
(ANNs) are inspired by biological neurons. ANNs are used to estimate and approximate the functions
that are the outcome of vast and unknown inputs.
Advantages of ANNs:
- Performs tasks that the linear programs could not perform
- The network functions smoothly even in case of any element failure
- There is no need of reprogramming
Drawbacks of ANNS
- Need efficient training
- Huge processing time for large Neural Networks
Applications of Neural Networks
1. Signal Processing: In telephone networks ANNs are used to process the signals thereby
suppressing the noise
2. Pattern Recognition: Handwritten characters, Radar signal classification and analysis, Speech
recognition, Finger print recognition, character recognition, Handwriting analysis
3. Medicine: ECG signal analysis and understanding, Diagnosis of diseases, Medical Image
Processing
4. Image Processing: Image matching, pre-processing, image compression
5. Military Systems: Sea mine detection, Radar cluster classification
6. Power Systems: State estimation, Transient deduction, Fault deduction
Biological Neural Networks:
Human brain has ten billion interconnected neurons. Each neuron or nerve cell uses biological reactions
to receive, process and transmit information
1
Dendrites
Nucleus
Soma or Axon
Cell body
Axon hillock
Structure of a Biological Neuron
Components of a biological neuron
The major components are Dendrites, Soma, Axon.
Dendrites: Receive signals from other neurons. Signals are electric impulses transmitted across synaptic
gap using chemical reactions.
Soma: Contains nucleus and pericardium. It sums incoming signals when sufficient input is received
when the cell fires.
Axon: Neuron sends spikes of electrical activity through long thin stand called axon which is split into
branches.
Artificial Neural Networks ( ANNs)
- ANN is information processing system, with common characteristics of biological neurons.
- Basic signal processing elements of neural networks are neurons
- Signals passed between neurons over the connection links
- Each connected link has an associated weight
- Each neuron applies activation function to its net input to determine its output signal
Characters of ANN
- Architecture
- Learning algorithm
- Activation function
2
Representation of Artificial Neuron
Yin = w1x1 + w2x2 + w3x3
Activation function of neuron y is represented as y = ƒ(Yin)
Comparison between artificial neurons and biological neurons
Let PE be the processing elements. Properties of PEs of ANN with biological neurons
• PE receives many signals
• Signals are modified by weight at receiving synapse
• PE sums weighted inputs
• Output from a particular neuron may go to many other neurons
• Biological neurons use synaptic strength as weight to inhibit or excite adjacent neuron. ANN use
random numbers as weight.
• Biological neurons undergo electro chemical reactions where as ANNs utilize mathematical
learning equations to introduce learning
• Biological neurons are fast in nature (learn and adapt) but the ANNs are slow when compared to
biological neurons.
ANN functioning
ANN functioning corresponds to the arrangement of neurons into layers and connected patterns
between each layer.
3
Types:
1. Feed Forward networks( Non recurrent)
a. Single Layer net
b. Multi Layer net
2. Feedback or Recurrent networks
Non recurrent networks:
Single layer Representation: The single layer representation comprises of only the input layer and the
output layer. No hidden Layers are included
Fig. Single Layer Representation
Multilayer network representation: The multilayer representation consists of Input layer, one or more
hidden layers and output layer.
4
Feedback Networks or Recurrent Networks or Counter Propagation Networks
Activation Functions (AF)
AFs are used to calculate output response of neuron.
For example, let us consider the following ANN
5
Inputs: x1,x2
Weights: w11,w12,w21, w22
Output: y1, y2
AF for the above example is as follows:
AF Types:
1. Linear Function / Identity function
6
2. Step function
3. Sigmoidal function
Types of outputs for AFs
1. Binary output( 1 or 0)
2. Bipolar output(+1 or -1)
Learning or training
Types : 1. Supervised 2. Unsupervised 3. Fixed weight networks
7
Classification of Learning Algorithms
1. Supervised Learning:
a. Hebb rule: ∆wi = ηxiy
Hebb rule is determines the change in the ith synaptic weight of a node ‘I’ (∆wi).
η: Learning rate
xi: Input
Y: post synaptic response and y = Σj wjxj for each node j.
b. Delta rule:
Delta rule is the gradient descent learning rule for updating input weights in a single layer ANN.
∆wji = ηxi(dj — yj)
η: Learning rate
xi: Input
dj: Desired output or target output
yj: actual output
yj = Σ wjixi
i
c. Back Propagation Network(BPN)
d. Perceptron Learning rule
The perceptron learning rule was developed by Frank Rosenblatt in the late 1950s. Training patterns are
presented to the network's inputs; the output is computed. Then the connection weights wj are
modified by an amount that is proportional to the product of the difference between the actual output
y, and the desired output d, and the input pattern, x.
wj(t + 1) = wj(t) + ηxi(dj — yj)
8
η: Learning rate
xi: Input
dj: Desired output or target output
yj: actual output
t: iteration number
Note: In supervised learning the error information is used to improve the network behaviour
Associative memory : represents the Neural Network Association between the input vector and the
output vector
Auto Associative memory: If the desired output vector is same as the input vector, it is called auto
associative memory
Hetero Associative memory: If the output target vector is different from input vector
2. Unsupervised Learning
Error information is not used to improve the network behaviour. The network is self organizing.
Examples for unsupervised learning:
• Kohenon Self Organizing map
• ART ( Adaptive Resonance theory)
3. Fixed weight neurons
The weights of the neurons are not changed throughout the network.
Examples:
• Hopfield net
• Max net
9
Neuron Modelling
McCulloch Pitts Neuron model
• It is an earliest Artificial Neuron model
• Developed by Warren Mc Culloch and Walter Pitts in 1943
• It is also called Linear Thresholod gate
Components:
Set of Inputs xi
Set of weights wti
Threshold , u
Activation function, f
Single neuron output, y
Architecture
Yin: Input Signal
Θ : Threshold
Activation Function:
ƒ(Yin) = 1 iƒ Yin ≥ θ
0 iƒ Yin < θ
10
Example 1: McCulloch pitts neuron for AND function
X1 X2 Y
1 1 1
1 0 0
0 1 0
0 0 0
Example 2: McCulloch pitts neuron for OR function
X1 X2 Y
1 1 1
1 0 1
0 1 1
0 0 0
Example 3: McCulloch pitts neuron for XOR function
X1 X2 Y
1 1 0
1 0 1
0 1 1
0 0 0
11
Example 4: McCulloch pitts neuron to perform XOR with the following neural model:
Z1net = x1w11 + x2w21 and Z2net = x1w12 + x2w22
Z1 = 1 iƒ Z1net ≥ 2
0 iƒ Z1net < 2
Z2 = 1 iƒ Z2net ≥ 2
0 iƒ Z2net < 2
Ynet = Z1V1 + Z2V2
Y = 1 iƒ Ynet ≥ 2
0 iƒ Ynet < 2
X1 X2 T W11 W12 W21 W22 V1 V2 Θ1 Θ2 Θ3 Ynet Y Znet1 Znet2 Z1 Z2
1 1 0 2 -1 -1 2 1 1 2 2 1 0 0 1 1 0 0
1 0 1 2 -1 -1 2 1 1 2 2 1 1 1 2 -1 1 0
0 1 1 2 -1 -1 2 1 1 2 2 1 1 1 -1 2 0 1
0 0 0 2 -1 -1 2 1 1 2 2 1 0 0 0 0 0 0
12
Simple Neural network for pattern classification
Fig. Single Layer for pattern classification
Let ‘b’ be the bias value which is always 1. For Bipolar input, the output function y is as follows:
y = fnet = 1 if net ≥ 0
—1 if net < 0
For Binary input, the output function y is as follows:
y = 1 if ynet ≥ 0
0 if ynet < 0
net = b + Σwixi
i
b + x1w1 + x2w2 = 0
—w1 b
x = x —
2 1
w2 w2
x1w1 + x2w2 = 0
—w1 0
x = x —
2 1
w2 w2
13
1.2 Adaline: Adaptive Linear Neuron
Rule: Difference between Actual output and desired output is the background for error correction
Learning: Changing of weights in ANN. The value of correction is proportional to signal at the elements
input.
- Has single neuron of Mc culloch pitts model
- Weights are determined by LMS (Least Mean Square Error) learning rule
- LMS rule is otherwise called Delta rule
- It is well established supervised learning method
Structure of Adaline:
Basic structure follows simple neuron with linear activation function and a feedback loop.
ALC: Adaptive Linear Combiner
14
y = +1 iƒ ALC output is + ve
—1 iƒ ALC output is — ve
n
y = w0 + Σ wjxj
j=1
Where w0 is tℎe bias weigℎt
If x0 = 1 then
y = Σn j=0 wjxj
y = wtx
Adaline Training Methodology
- Structure resembles a simple neuron with and extra feedback loop
- During training the input vector xi and the Desired output D is presented to the network
- Weights are adjusted based on Delta rule
- Inputs with fixed weight produces scalar output after training
- The network performs n dimensional mapping to a scalar value
- Activation function not used during training phase
- Training and Generalization are two important aspects of Adaline network
Applications of Adaline
- Making binary decisions
- Realizations of AND, OR and NOT gates
- Only linear separable functions are recognized.
- Linear Separability: The idea behind hidden Layers
- two sets are linearly separable if there exists at least one line in the plane with all of the positive
values on one side of the line and all the negative values on the other side.
-
- Fig. Linear Separability
15
- But XOR function cannot be separated using a single line. Two lines are needed to segregate positive
and negative values. Hence it is not supported by adaline.
-
- Fig. XOR is not linear separable
LMS Learning Rule:
To train Adaline to perform a given processing function.
Let x be the input vector
w: weights
y: output values
dk: Desired or correct output value
Manual Calculation of w*( weight adjustment)
Given the set of input, desired output pairs {(x1,d1), (x2,d2)}, the best value of w* needs to be
calculated.
Let yk be the actual output for kth input vector.
Error term sk = dk — yk (Equation 1)
Mean squared error sk2 is defined as follows:
1
sk 2 = Σ Lk=1 sk2 (Eqn 2) where L is the number of input vectors in training set
L
yk = wtxk
Substituting equations 3, 2 in 1
16
1.3 Back Propagation Network
-Supervised Learning
- Feed forward network
- Multilayer Perceptron
BPN Rule: Adjusting the weights in previous level of layers to reduce error. This leads to Delta learning
rule
Fig. BPN topology
BPN algorithm
Read ‘n’ number of input nodes
Read ‘h’ number of hidden nodes
Read ‘m’ number of output nodes
Step 1: Read the input vector xi
For i=1 to n
Read x[i]
Step 2: Read the output vector ok (Desired output)
For k=1 to m
Read o[k]
17
Step 3: Read the input hidden weights whij
For i=1 to n
For j=1 to h
Read wh[i][j]
Step 4: Read the output hidden weights wojk
For j=1 to h
For k=1 to m
Read wo[j][k]
Step 5: Calculate nethj (net value in hidden layer)
For j= 1 to h
n
netℎ[j] = Σx[i] ℎ wℎ[i][j]
i=1
Step 6: Calculate the f(net) : ‘ohj’ in hidden layer (sigmoidal function)
For j= 1 to h
1
oℎ[j] =
(1 + e–neth[j])
Step 7: Calculate the net output: “netok “
For j=1 to h
For k=1 to m
neto[k]+= oℎ[j] ℎ wo[j][k]
Step 8: Calculate ook (actual output)
For k=1 to m
1
oo[k] =
(1 + e–neto[k])
Step 9: Calculate error in output layer ‘eok’ ( Desired output- Actual output)
18
For k=1 to m
eo[k] = o[k] — oo[k]
Step 10: Calculate error in hidden layer ‘ehj’
For j=1 to h
For k=1 to m
eℎ[j]+= eo[k] ℎ wℎ[j][k]
Step 11: Calculate the new weights(nwojk) for the hidden output ‘nwojk’
For j=1 to h
For k=1 to m
nwo[j][k] = wo[j][k] + (η ℎ eo[k] ℎ oh[j])
Step 12: New weight for input hidden layer is calculated as follows : ‘nwhij’
For i=1 to n
For j=1 to h
nwℎ[i][j] = wℎ[i][j] + (η ℎ eh[j] ℎ x[i])
The new weight obtained for hidden output layer is nwoij and the new weight obtained for input hidden
layer is nwij
Step 13: Replace old weights in hidden layer and output layer with new weights ‘nwhij’ and ‘nwojk’
1.4 Hopfield Network
-Fully connected network
-Symmetric weights
Output:- Step function
Inputs: Bipolar inputs( +1 and -1)
19
Single Layer recurrent network
NETj = Σi\j ωij OUTi+INj
OUTj=1 if NETj>Tj
OUTj=0 if NETj < Tj
OUTj=unchanged if NETj = Tj
Hopfield Algorithm.
1. Assign common weights M-1
Σs=0 Xs Xs iƒ i G j
Wij= { i j
0 iƒ i = j
2. Initialize with unknown pattern
µi(0)= [ΣN–1
i=0 xi (t)ωij] 0 ≤ i N-1
3. µi(t) is the output of node i at time t.
4. Iterate until convergence
N–1
µi(t+1)=fn[Σi=0 μj (t)ωij] =1 if net > θ, 0 if net < θ, and no change if net= θ
Energy Landscape in Hopfield
20
Hopfield nets have a scalar value associated with each state of the network referred to as the "energy",
E, of the network, where:
E: Energy
Ti= Threshold
wij=weight
xi=input
Function of energy landscape: Storage and retrieval. The units are randomly chosen for updation. When
the units are chosen energy E will be decremented or stable. Repeated updation leads to the eventual
convergence to a local minimum state in the energy function(Lyapunov function).Thus E is stable when
state is local minimum.
Learning:
Local learning: A learning rule is local if each weight is updated using information available to neurons
on either side of the connection that is associated with that particular weight.
Incremental learning: New patterns can be learned without using information from the old patterns
that have been also used for training. That is, when a new pattern is used for training, the new values
for the weights only depend on the old values and on the new pattern
Hebbian Learning rule for Hopfield networks:
The Hebbian rule is both local and incremental. For the Hopfield Networks, it is implemented in
the following manner, when learning n binary patterns:
wij = 1 Σn ep ep where epis the bit i, from pattern p. If the bit representation of i and j are
n p=1 i j i
equal, then the product, e p e p is positive. This in turn will have a positive effect on weight wij. If
i j
the neurons are different then the product is negative.
21
Drawback of Hopfield Network
Converges to a local minima.So we go for Boltzman machine.
1.5 Boltzman Machine learning algorithm
Boltzman machine=Hopfield network+Probablistic update machine
Probability function is chosen to achieve great reduction in energy.
Two Phases
Phase 1
Incremental Boltzman algorithm:
1. Clamp input and output to correct value.
2. Let net cycle through states Si.
Calculate energy of a state Si
∆Ek =Σi ωkisi — θk 0≤i≤N-1
th
3. K neuron switch to lower energy with probability
1
Pk= —∆Ek (T=Temp)
/
1+e T
4. Reduce T until output is stable.
Phase 2: Decremental
1. Clamp input only and leave output.
2. Let net reach thermal equilibrium as in phase 1.
3. Decrement weights between units if both ON( value=1).
4. Repeat until weights are stable.
1.6 SOM: Self Organizing Maps
A self-organizing map consists of components called nodes or neurons. Associated with each node are
a weight vector of the same dimension as the input data vectors, and a position in the map space.
- Unsupervised learning
- to produce a low-dimensional (typically two-dimensional), discretized representation of the input
space of the training samples, called a map.
- they use a neighborhood function to preserve the topological properties of the input space.
- Two models available are Kohenon model and Willshaw model
22
Kohenon Model:
Introduced by Finnish professor Teuvo Kohonen in the 1980s
Two modes of operation:
1. Training
2. Mapping
Components of Self Organization
The self-organization process involves four major components:
Initialization: All the connection weights are initialized with small random values.
Competition: For each input pattern, the neurons compute their respective values of a discriminant
function which provides the basis for competition. The particular neuron with the smallest value of the
discriminant function is declared the winner.
Cooperation: The winning neuron determines the spatial location of a topological neighbourhood of
excited neurons, thereby providing the basis for cooperation among neighbouring neurons.
Adaptation: The excited neurons decrease their individual values of the discriminant function in relation
to the input pattern through suitable adjustment of the associated connection weights, such that the
response of the winning neuron to the subsequent application of a similar input pattern is enhanced.
Training Data:
Let xi be the input vectors ( P distinct training vectors are taken)
23
Inputs
X1: x11, x12,……x1n
X2: x21, x22,……x2n
Xp: xp1, xp2,……xpn
Output: Vector y of length m
Y: y1, y2,……ym
Network Architecture:
2 Layer of units
Input: n units
Output: m units
Inputs are fully connected with weights to outputs
Algorithm:
1. Select the output layer in the network topology
Initialize the current neighbourhood distance D(0) to any positive value
2. Initialize the weights from inputs to outputs to small random value
3. Let t=1
4. Do while
4.1. Select Input sample il
4.2 Compute square of Euclidian distance of il from the weight vectors wj
Σn (il,k — wj,k(t))2
k=1
4.3 Select output node j* having weight with minimum value(from step 2)
4.4 Update weights to all nodes within atopological distance given by D(t) from j* using
weight update rule
24
wj(t + 1) = wj(1) + η(t)(ij — wj(t))
4.5 Increment t
End while
Generally ηdecreases with time 0 < η(t) ≤ η(t — 1) ≤ 1
Application:
1. Phonetic typewriter
2. Pattern recognition :Winning neurons with minimum distance are brought together in a single cluster.
References:
1. James A. Freeman, David M. Skapura,”neural networks, algorithms, applications and
programming techniques, Pearson Education, 1991
2. http://lcn.epfl.ch/tutorial/english/perceptron/html/learning.html
3. https://en.wikipedia.org
4. John A. Bullinaria Introduction to Neural Networks : Lecture 16,SOM fundamentals, 2004
http://www.cs.bham.ac.uk/~jxb/NN/l16.pdf
25