STS5422
ARTIFICIAL
NEURAL
NETWORK 2
TOPIC 7
Multilayer Neural Networks
• A Multilayer Neural Network (MLNN) is a type of artificial neural network
composed of multiple layers of neurons, where each layer is fully or partially
connected to the next.
• MLNN are the foundation of deep learning and are used to model complex
relationships in data.
• The network consists of an input layer of source neurons, at least one middle
or hidden layer of computational neurons, and an output layer of
computational neurons.
• The input signals are propagated in a forward direction on a layer-by-layer
Ou t p u t Sig n a ls
Input Signals
basis.
First Second
Input hidden hidden Output
layer layer layer layer
What is Input Layer?
• The first layer in the network, responsible for receiving raw input data.
• Each neuron in the input layer corresponds to a feature of the input data.
o For example, if the input is an image with 28x28 pixels, the input layer will
have 784 neurons (one for each pixel).
o In a network processing tabular data, the input layer might represent
numerical features like age, income, or weight.
o For an image recognition task, the input layer accepts pixel values of an
image.
What Does The Middle Layer Hide?
• A hidden layer “hides” its desired output. Neurons in the hidden layer cannot
be observed through the input/output behaviour of the network.
• There is no obvious way to know what the desired output of the hidden layer
should be.
• For example:
o In an image classification task, early hidden layers might detect edges,
corners, and textures.
o Deeper hidden layers might identify shapes or patterns (e.g., eyes, wheels,
etc.).
How Hidden Layer Works
• Activation Functions:
o The transformation in hidden layers is controlled by activation functions
(e.g., ReLU, Sigmoid) to add non-linearity.
• Weights and Biases:
o Each hidden layer learns weights and biases during training to capture the
underlying patterns in the data.
• Commercial ANNs incorporate three and sometimes four layers, including one
or two hidden layers. Each layer can contain from 10 to 1000 neurons.
Experimental neural networks may have five or even six layers, including
three or four hidden layers, and utilise millions of neurons.
• Training multilayer neural networks can involve a number of different
algorithms, but the most popular is the back propagation algorithm or
generalized delta rule.
Multi-layer Perceptrons
• This multi-layer network has different names: multi-layer perceptron (MLP),
feed-forward neural network, artificial neural network (ANN), backprop
network.
• Recall the simple neuron-like unit:
• These units are much more powerful if we connect many of them into a neural
network.
• We can connect lots of units
together into a directed acyclic
graph.
• This gives a feed-forward neural
network. That’s in contrast to
recurrent neural networks, which
can have cycles.
• Typically, units are grouped
together into layers.
• Each layer connects N input units to M
output units.
• In the simplest case, all input units are
connected to all output units. We call this a
fully connected layer. We'll consider other
layer types later.
• Note: the inputs and outputs for a layer are
distinct from the inputs and outputs to the
network.
• Recall from multiway logistic regression:
this means we need an weight matrix.
• The output units are a function of the input
units:
• Some activation functions:
• Some activation functions:
Designing a network to compute XOR
• XOR is a Boolean function that is true for two variables if and only if one of the
variables is true and the other is false.
• a logical operation that outputs true (1) if the inputs are different and false (0)
if the inputs are the same.
• XOR is a fundamental concept in digital logic and serves as a classic example
in machine learning for demonstrating the need for non-linear models like
neural networks.
• Assume hard threshold activation function
Example
• We want to classify a single data point into one of two classes
Input Layer: 2 neurons
Hidden Layer: 2 neurons
Output Layer: 1 neuron
Back Propagation Algorithm
• Learning in a multilayer network proceeds the same way as for a perceptron.
• A training set of input patterns is presented to the network.
• The network computes its output pattern, and if there is an error - or in other
words a difference between actual and desired output patterns - the weights
are adjusted to reduce this error.
• In a back-propagation neural network, the learning algorithm has two phases.
o First, a training input pattern is presented to the network input layer. The
network propagates the input pattern from layer to layer until the output
pattern is generated by the output layer.
o If this pattern is different from the desired output, an error is calculated
and then propagated backwards through the network from the output layer
to the input layer. The weights are modified as the error is propagated.
Three-layer back-propagation neural
network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
The Back-propagation Training Algorithm
• Step 1: Initialisation
o Set all the weights and threshold levels of the network to random numbers
uniformly distributed inside a small range:
2.4 2 .4
,
Fi Fi
o where is the total number of inputs of neuron in the network. The weight
initialisation is done on a neuron-by-neuron basis.
• Step 2: Activation
o Activate the back-propagation neural network by applying inputs and
desired outputs .
o a) Calculate the actual outputs of the neurons in the hidden layer:
n
y j ( p ) sigmoid xi ( p ) wij ( p ) j
i 1
o where is the number of inputs of neuron in the hidden layer, and sigmoid
is the sigmoid activation function.
o (b) Calculate the actual outputs of the neurons in the output layer:
m
yk ( p ) sigmoid x jk ( p ) w jk ( p ) k
j 1
o where is the number of inputs of neuron k in the output layer.
• Step 3: Weight training
o Update the weights in the back-propagation network propagating
backward the errors associated with output neurons.
o (a) Calculate the error gradient for the neurons in the output layer:
k ( p) yk ( p) 1 yk ( p) ek ( p)
o where:
ek ( p ) yd ,k ( p ) yk ( p )
o Calculate the weight corrections:w
jk ( p ) y j ( p ) k ( p )
o Update the weights at the output neurons:
w jk ( p 1) w jk ( p ) w jk ( p )
o (b) Calculate the error gradient for the neurons in the hidden layer:
l
j ( p ) y j ( p ) [1 y j ( p )] k ( p ) w jk ( p )
k 1
o Calculate the weight corrections:
wij ( p ) xi ( p ) j ( p )
o Update the weights at the hidden neurons:
wij ( p 1) wij ( p ) wij ( p )
• Step 4: Iteration
o Increase iteration p by one, go back to Step 2 and repeat the process until
the selected error criterion is satisfied.
• As an example, we may consider the three-layer back-propagation network.
Suppose that the network is required to perform logical operation Exclusive-
OR. Recall that a single-layer perceptron could not do this operation. Now
we will apply the three-layer net.
1
3
1
w13
x1 1 3 w35
w23 5
5 y5
w24
x2 2 4 w45
w24
Input Output
4
layer layer
1
Hidden layer
• The effect of the threshold applied to a neuron in the hidden or output layer is
represented by its weight, , connected to a fixed input equal to -1.
• The initial weights and threshold levels are set randomly as follows:
• We consider a training set where inputs and are equal to 1 and desired
output . The actual outputs of neurons 3 and 4 in the hidden layer are
calculated as
y3 sigmoid ( x1w13 x2 w23 3 ) 1 / 1 e (10.510.4 10.8) 0.5250
y4 sigmoid ( x1w14 x2 w24 ) 1 / 1 e
4
(10.9 11.0 10.1)
0.8808
• Now the actual output of neuron 5 in the output layer is determined as:
y5 sigmoid ( y3w35 y4 w45 5 ) 1 / 1 e ( 0.52501.20.88081.1 10.3) 0.5097
• Thus, the following error is obtained:
e yd ,5 y5 0 0.5097 0.5097
• The next step is weight training. To update the weights and threshold levels in
our network, we propagate the error, , from the output layer backward to the
input layer.
• First, we calculate the error gradient for neuron 5 in the output layer:
5 y5 (1 y5 ) e 0.5097 (1 0.5097) ( 0.5097) 0.1274
• Then we determine the weight corrections assuming that the learning rate
parameter, , is equal to 0.1:
w35 y3 5 0.1 0.5250 ( 0.1274) 0.0067
w45 y4 5 0.1 0.8808 ( 0.1274) 0.0112
5 ( 1) 5 0.1 ( 1) ( 0.1274) 0.0127
• Next we calculate the error gradients for neurons 3 and 4 in the hidden layer:
3 y3 (1 y3 ) 5 w35 0.5250 (1 0.5250) ( 0.1274) ( 1.2) 0.0381
4 y4 (1 y4 ) 5 w45 0.8808 (1 0.8808) ( 0.127 4) 1.1 0.0147
• We then determine the weight corrections:
w13 x1 3 0.1 1 0.0381 0.0038
w23 x2 3 0.1 1 0.0381 0.0038
3 ( 1) 3 0.1 ( 1) 0.0381 0.0038
w14 x1 4 0.1 1 ( 0.0147) 0.0015
w24 x2 4 0.1 1 ( 0.0147) 0.0015
4 ( 1) 4 0.1 ( 1) ( 0.0147) 0.0015
• At last, we update all weights and threshold:
w13 = w13 + D w13 = 0 . 5 + 0 . 0038 = 0 . 5038
w14 = w14 + D w14 = 0 . 9 - 0 . 0015 = 0 . 8985
w 23 = w 23 + D w 23 = 0 . 4 + 0 . 0038 = 0 . 4038
w 24 = w 24 + D w 24 = 1 . 0 - 0 . 0015 = 0 . 9985
w 35 = w 35 + D w 35 = - 1 . 2 - 0 . 0067 = - 1 . 2067
w 45 = w 45 + D w 45 = 1 . 1 - 0 . 0112 = 1 . 0888
q 3 = q 3 + D q 3 = 0 . 8 - 0 . 0038 = 0 . 7962
q 4 = q 4 + D q 4 = - 0 . 1 + 0 . 0015 = - 0 . 0985
q 5 = q 5 + D q 5 = 0 . 3 + 0 . 0127 = 0 . 3127
• The training process is repeated until the sum of squared errors is less than 0.001.
Learning curve for operation Exclusive-OR
Sum-Squared Network Error for 224 Epochs
1
10
100
Sum-Squared Error
10-1
10-2
10-3
10-4
0 50 100 150 200
Epoch
• Final results of three-layer network learning
Inputs Desired Actual Error Sum of
output output squared
x1 x2 yd y5 e errors
1 1 0 Y
0.0155 0.0155 0.0010
e
0 1 1 0.9849 0.0151
1 0 1 0.9849 0.0151
0 0 0 0.0175 0.0175