Neural Network from scratch in Python _ by Omar Aflak _ Towards Data Science
Neural Network from scratch in Python _ by Omar Aflak _ Towards Data Science
Listen Share
In this post we will go through the mathematics of machine learning and code from
scratch, in Python, a small library to build neural networks with a variety of layers
(Fully Connected, Convolutional, etc.). Eventually, we will be able to create networks
in a modular fashion:
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 1/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
I’m assuming you already have some knowledge about neural networks. The purpose
here is not to explain why we make these models, but to show how to make a proper
implementation.
Layer by Layer
We need to keep in mind the big picture here :
2. The data flows from layer to layer until we have the output.
3. Once we have the output, we can calculate the error which is a scalar.
The most important step is the 4th. We want to be able to have as many layers as we
want, and of any type. But if we modify/add/remove one layer from the network, the
output of the network is going to change, which is going to change the error, which
is going to change the derivative of the error with respect to the parameters. We
need to be able to compute the derivatives regardless of the network architecture,
regardless of the activation functions, regardless of the loss we use.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 2/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Forward propagation
We can already emphasize one important point which is: the output of one layer is
the input of the next one.
This is called forward propagation. Essentially, we give the input data to the first
layer, then the output of every layer becomes the input of the next layer until we
reach the end of the network. By comparing the result of the network (Y) with the
desired output (let’s say Y*), we can calculate en error E. The goal is to minimize that
error by changing the parameters in the network. That is backward propagation
(backpropagation).
Gradient Descent
This is a quick reminder, if you need to learn more about gradient descent there are
tons of resources on the internet.
Basically, we want to change some parameter in the network (call it w) so that the
total error E decreases. There is a clever way to do it (not randomly) which is the
following :
Where α is a parameter in the range [0,1] that we set and that is called the learning
rate. Anyway, the important thing here is ∂E/∂w (the derivative of E with respect to
w). We need to be able to find the value of that expression for any parameter of the
network regardless of its architecture.
Backward propagation
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 3/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Suppose that we give a layer the derivative of the error with respect to its output
(∂E/∂Y), then it must be able to provide the derivative of the error with respect to its
input (∂E/∂X).
Let’s forget about ∂E/∂X for now. The trick here, is that if we have access to ∂E/∂Y we
can very easily calculate ∂E/∂W (if the layer has any trainable parameters) without
knowing anything about the network architecture ! We simply use the chain rule :
The unknown is ∂y_j/∂w which totally depends on how the layer is computing its
output. So if every layer have access to ∂E/∂Y, where Y is its own output, then we can
update our parameters !
This is very important, it’s the key to understand backpropagation ! After that, we’ll
be able to code a Deep Convolutional Neural Network from scratch in no time !
This may seem abstract here, but it will get very clear when we will apply this to a
specific type of layer. Speaking of abstract, now is a good time to write our first
python class.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 5/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
1 # Base class
2 class Layer:
3 def __init__(self):
4 self.input = None
5 self.output = None
6
7 # computes the output Y of a layer for a given input X
8 def forward_propagation(self, input):
9 raise NotImplementedError
10
11 # computes dE/dX for a given dE/dY (and update parameters if any)
12 def backward_propagation(self, output_error, learning_rate):
13 raise NotImplementedError
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 6/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Forward Propagation
The value of each output neuron can be calculated as the following :
With matrices, we can compute this formula for every output neuron in one shot
using a dot product :
We’re done with the forward pass. Now let’s do the backward pass of the FC layer.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 7/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Note that I’m not using any activation function yet, that’s because we will implement it in
a separate layer!
Backward Propagation
As we said, suppose we have a matrix containing the derivative of the error with
respect to that layer’s output (∂E/∂Y). We need :
1. The derivative of the error with respect to the parameters (∂E/∂W, ∂E/∂B)
Let's calculate ∂E/∂W. This matrix should be the same size as W itself : ixj where i
is the number of input neurons and j the number of output neurons. We need one
gradient for every weight :
Therefore,
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 8/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
That’s it we have the first formula to update the weights! Now let's calculate ∂E/∂B.
Again ∂E/∂B needs to be of the same size as B itself, one gradient per bias. We can
use the chain rule again :
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 9/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Now that we have ∂E/∂W and ∂E/∂B, we are left with ∂E/∂X which is very important
as it will “act” as ∂E/∂Y for the layer before that one.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 10/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
That’s it! We have the three formulas we needed for the FC layer!
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 11/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Activation Layer
All the calculation we did until now were completely linear. It's hopeless to learn
anything with that kind of model. We need to add non-linearity to the model by
applying non-linear functions to the output of some layers.
Now we need to redo the whole process for this new type of layer!
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 12/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
https://gph.is/21pKLjE
No worries, it’s going to be way faster as there are no learnable parameters. We just
need to calculate ∂E/∂X.
We will call f and f' the activation function and its derivative respectively.
Forward Propagation
As you will see, it is quite straightforward. For a given input X , the output is simply
the activation function applied to every element of X . Which means input and
output have the same dimensions.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 13/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Backward Propagation
Given ∂E/∂Y, we want to calculate ∂E/∂X.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 14/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
You can also write some activation functions and their derivatives in a separate file.
These will be used later to create an ActivationLayer .
1 import numpy as np
2
3 # activation function and its derivative
4 def tanh(x):
5 return np.tanh(x);
6
7 def tanh_prime(x):
8 return 1-np.tanh(x)**2;
Loss Function
Until now, for a given layer, we supposed that ∂E/∂Y was given (by the next layer).
But what happens to the last layer? How does it get ∂E/∂Y? We simply give it
manually, and it depends on how we define the error.
The error of the network, which measures how good or bad the network did for a
given input data, is defined by you. There are many ways to define the error, and
one of the most known is called MSE — Mean Squared Error.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 15/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Where y* and y denotes desired output and actual output respectively. You can
think of the loss as a last layer which takes all the output neurons and squashes
them into one single neuron. What we need now, as for every other layer, is to
define ∂E/∂Y. Except now, we finally reached E !
These are simply two python functions that you can put in a separate file. They will
be used when creating the network.
1 import numpy as np
2
3 # loss function and its derivative
4 def mse(y_true, y_pred):
5 return np.mean(np.power(y_true-y_pred, 2));
6
7 def mse_prime(y_true, y_pred):
8 return 2*(y_pred-y_true)/y_true.size;
Network Class
Almost done ! We are going to make a Network class to create neural networks very
easily akin the first picture !
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 16/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 17/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
1 class Network:
2 def __init__(self):
3 self.layers = []
4 self.loss = None
5 self.loss_prime = None
6
7 # add layer to network
8 def add(self, layer):
9 self.layers.append(layer)
10
11 # set loss to use
12 def use(self, loss, loss_prime):
13 self.loss = loss
14 self.loss_prime = loss_prime
15
16 # predict output for given input
17 def predict(self, input_data):
18 # sample dimension first
19 samples = len(input_data)
20 result = []
21
22 # run network over all samples
23 for i in range(samples):
24 # forward propagation
25 output = input_data[i]
26 for layer in self.layers:
27 output = layer.forward_propagation(output)
28 result.append(output)
29
30 return result
31
32 # train the network
33 def fit(self, x_train, y_train, epochs, learning_rate):
34 # sample dimension first
35 samples = len(x_train)
36
37 # training loop
38 for i in range(epochs):
39 err = 0
40 for j in range(samples):
41 # forward propagation
42 output = x_train[j]
43 for layer in self.layers:
44 output = layer.forward_propagation(output)
45
46 # compute loss (for display purpose only)
47 err += self.loss(y_train[j], output)
48
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 18/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
48
49 # backward propagation
50 error = self.loss_prime(y_train[j], output)
51 for layer in reversed(self.layers):
52 error = layer.backward_propagation(error, learning_rate)
53
54 # calculate average error on all samples
55 err /= samples
56 print('epoch %d/%d error=%f' % (i+1, epochs, err))
Solve XOR
Starting with XOR is always important as it’s a simple way to tell if the network is
learning anything at all.
1 import numpy as np
2
3 from network import Network
4 from fc_layer import FCLayer
5 from activation_layer import ActivationLayer
6 from activations import tanh, tanh_prime
7 from losses import mse, mse_prime
8
9 # training data
10 x_train = np.array([[[0,0]], [[0,1]], [[1,0]], [[1,1]]])
11 y_train = np.array([[[0]], [[1]], [[1]], [[0]]])
12
13 # network
14 net = Network()
15 net.add(FCLayer(2, 3))
16 net.add(ActivationLayer(tanh, tanh_prime))
17 net.add(FCLayer(3, 1))
18 net.add(ActivationLayer(tanh, tanh_prime))
19
20 # train
21 net.use(mse, mse_prime)
22 net.fit(x_train, y_train, epochs=1000, learning_rate=0.1)
23
24 # test
25 out = net.predict(x_train)
26 print(out)
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 19/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
I don’t think I need to emphasize many things. Just be careful with the training data,
you should always have the sample dimension first. For example here, the input
shape is (4,1,2).
Result
$ python xor.py
epoch 1/1000 error=0.322980
epoch 2/1000 error=0.311174
epoch 3/1000 error=0.307195
...
epoch 998/1000 error=0.000243
epoch 999/1000 error=0.000242
epoch 1000/1000 error=0.000242
[
array([[ 0.00077435]]),
array([[ 0.97760742]]),
array([[ 0.97847793]]),
array([[-0.00131305]])
]
Clearly this is working, great ! We can now solve something more interesting, let’s
solve MNIST !
Solve MNIST
We didn’t implemented the Convolutional Layer but this is not a problem. All we
need to do is to reshape our data so that it can fit into a Fully Connected Layer.
Open
MNISTin app
Dataset Sign
consists of images of digits from 0 to 9, of shape 28x28x1. The up is to
goal Sign in
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 20/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
1 import numpy as np
2
3 from network import Network
4 from fc_layer import FCLayer
5 from activation_layer import ActivationLayer
6 from activations import tanh, tanh_prime
7 from losses import mse, mse_prime
8
9 from keras.datasets import mnist
10 from keras.utils import np_utils
11
12 # load MNIST from server
13 (x_train, y_train), (x_test, y_test) = mnist.load_data()
14
15 # training data : 60000 samples
16 # reshape and normalize input data
17 x_train = x_train.reshape(x_train.shape[0], 1, 28*28)
18 x_train = x_train.astype('float32')
19 x_train /= 255
20 # encode output which is a number in range [0,9] into a vector of size 10
21 # e.g. number 3 will become [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
22 y_train = np_utils.to_categorical(y_train)
23
24 # same for test data : 10000 samples
25 x_test = x_test.reshape(x_test.shape[0], 1, 28*28)
26 x_test = x_test.astype('float32')
27 x_test /= 255
28 y_test = np_utils.to_categorical(y_test)
29
30 # Network
31 net = Network()
32 net.add(FCLayer(28*28, 100)) # input_shape=(1, 28*28) ; output_shape=(1, 100
33 net.add(ActivationLayer(tanh, tanh_prime))
34 net.add(FCLayer(100, 50)) # input_shape=(1, 100) ; output_shape=(1, 50
35 net.add(ActivationLayer(tanh, tanh_prime))
36 net.add(FCLayer(50, 10)) # input_shape=(1, 50) ; output_shape=(1, 10
37 net.add(ActivationLayer(tanh, tanh_prime))
38
39 # train on 1000 samples
40 # as we didn't implemented mini-batch GD, training will be pretty slow if we update at each ite
41 net.use(mse, mse_prime)
42 net.fit(x_train[0:1000], y_train[0:1000], epochs=35, learning_rate=0.1)
43
44 # test on 3 samples
45 out = net.predict(x_test[0:3])
46 print("\n")
47 print("predicted values : ")
48 print(out end="\n")
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 21/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
48 print(out, end \n )
49 print("true values : ")
50 print(y_test[0:3])
$ python example_mnist_fc.py
epoch 1/30 error=0.238658
epoch 2/30 error=0.093187
epoch 3/30 error=0.073039
...
epoch 28/30 error=0.011636
epoch 29/30 error=0.011306
epoch 30/30 error=0.010901
predicted values :
[
array([[ 0.119, 0.084 , -0.081, 0.084, -0.068, 0.011, 0.057,
0.976, -0.042, -0.0462]]),
array([[ 0.071, 0.211, 0.501 , 0.058, -0.020, 0.175, 0.057 ,
0.037, 0.020, 0.107]]),
array([[ 1.197e-01, 8.794e-01, -4.410e-04, 4.407e-02, -4.213e-
02, 5.300e-02, 5.581e-02, 8.255e-02, -1.182e-01, 9.888e-02]])
]
true values :
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 22/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
https://gph.is/2jzemp3
OmarAflak/Medium-Python-Neural-Network
Contribute to OmarAflak/Medium-Python-Neural-Network
development by creating an account on GitHub.
github.com
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 23/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
I’ve recently put the content of that article into a beautifully animated video. You
can check it out on YouTube.
Neural Network from Scratch | Mathematics & Python Code — The Independent Code
Convolutional Neural Network from Scratch | Mathematics & Python Code — The Independent Code
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 24/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
If you liked this post — I’d really appreciate if you hit the clap button 👏 it
would help me a lot. Peace! 😎
Artificial Intelligence
Follow
Your home for data science and AI. The world’s leading publication for data science, data analytics, data
engineering, machine learning, and artificial intelligence professionals.
Follow
@omar_aflak
Responses (40)
Respond
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 25/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Andrea de Luca
over 5 years ago
A great article! Just one note: maybe you could clarify where you are using the matrix product vs the
Hadamard product (elementwise). Thanks! ;)
9 2 replies Reply
Mr.Kumar
about 6 years ago
Thank You
38 Reply
Ryan Daly
about 6 years ago
Feed Forward
Maybe I’ve misunderstood, but deep networks are deep networks. Feed forward is just the generative mode
of operation of any network. Prediction based on the model.
As an applied mathemagician I see all networks and stochastic Boltzmann machines as…...
Read More
2 1 reply Reply
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 26/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Omar Aflak
Bézier Interpolation
Create smooth shapes using Bézier curves.
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 27/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
2d ago 221 5
1d ago 188
Omar Aflak
Bézier Curve
Understand the mathematics of Bézier curves
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 28/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Ritesh Gupta
Dec 10 1
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 29/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Nov 12 68
Lists
ChatGPT
21 stories · 919 saves
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 30/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
LM Po
Sep 14 5 1
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 31/33
24/12/2024, 14:53 Neural Network from scratch in Python | by Omar Aflak | Towards Data Science
Dec 14 502 2
Dec 16 1.7K 13
For today’s recreational coding exercise, we solve the Navier-Stokes equations for an
incompressible viscous fluid. To do so, we will…
https://towardsdatascience.com/math-neural-network-from-scratch-in-python-d6da9f29ce65 33/33