Let’s Clarify
With two
feature input ,
weights and
bias values
Forward Pass – Hidden Layer
• neth1 = w1 * i1 + w2 * i2 + b1 * 1
• neth1 = 0.15 * 0.05 + 0.2 * 0.1 + 0.32 * 1 = 0.3775
• Passing neth1 through sigmoid function, gives the output of h1, outh1
• Out(h2)=? neth1 outh1
Forward Pass – Output Layer
neto1 outo1
Calculating the total Error
• Calculate the total error for each output neuron using the squared error
function and sum them to get the total error:
• Note: The ½ is included so that exponent is cancelled when we differentiate
later on. The result is eventually multiplied by a learning rate anyway so it
doesn’t matter that we introduce a constant here
• E(o2)=?
Total Error
Backward Pass- Actual Learning
• Our goal with backpropagation is to update each of the weights in the
network so that they cause the actual output to be closer to target output,
thereby minimizing the error for each output neuron and the network as a
whole.
• Lets observe Output Layer, Consider w5. We want to know how much
𝜕𝐸
change in w5 affects the total error, aka 𝑡𝑜𝑡𝑎𝑙
𝜕𝑤5
• By applying the chain rule
Derivative of sigmoid function
Also called as Delta Rule
Learning and Learning Rate
To decrease the error, we then subtract this value from the current weight (optionally multiplied by
some learning rate, eta, which we’ll set to 0.5):
Hidden Layer
• We need to update w1, w2, w3, w4. Hence we need to figure out,
Visually,
So,
Putting it all together,
Finally,
We’ve updated all of our weights! When we fed forward the 0.05
and 0.1 inputs originally, the error on the network was
0.298371109.
After this first round of backpropagation, the total error is now
down to 0.291027924.
It might not seem like much, but after repeating this process
10,000 times, for example, the error plummets to 0.000035085.
At this point, when we feed forward 0.05 and 0.1, the two outputs
neurons generate 0.015912196 (vs 0.01 target) and 0.984065734
(vs 0.99 target).
Neurons in hidden layer
• There are many rule-of-thumb methods for determining the
correct number of neurons to use in the hidden layers, such as
the following:
• The number of hidden neurons should be between the size of
the input layer and the size of the output layer.
• The number of hidden neurons should be 2/3 the size of the
input layer, plus the size of the output layer.
• The number of hidden neurons should be less than twice the size
of the input layer.