0% found this document useful (0 votes)
90 views29 pages

How Does Backpropagation Work in A CNN - Medium

backprop

Uploaded by

dsp.testbed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views29 pages

How Does Backpropagation Work in A CNN - Medium

backprop

Uploaded by

dsp.testbed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

05/11/2024, 16:07 How does Backpropagation work in a CNN?

| Medium

Convolutions and Backpropagations


Pavithra Solai · Follow
8 min read · Mar 19, 2018

Listen Share

Ever since AlexNet won the ImageNet competition in 2012, Convolutional Neural
Networks (CNNs) have become ubiquitous. Starting from the humble LeNet to
ResNets to DenseNets, CNNs are everywhere.

But have you ever wondered what happens in a Backward pass of a CNN, especially
how Backpropagation works in a CNN. If you have read about Backpropagation, you
would have seen how it is implemented in a simple Neural Network with Fully
Connected layers. (Andrew Ng’s course on Coursera does a great job of explaining it).
But, for the life of me, I couldn’t wrap my head around how Backpropagation works
with Convolutional layers.

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 1/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

The more I dug through the articles related to CNNs and Backpropagation, the more
confused I got. Explanations were mired in complex derivations and notations and
they needed an extra-mathematical muscle to understand it. And I was getting
nowhere.

I know, you don’t have to know the mathematical intricacies of a Backpropagation to


implement CNNs. You don’t have to implement them by hand. And hence, most of the
Deep Learning Books don’t cover it either.

So when I finally figured it out, I decided to write this article. To simplify and
demystify it. Of course, it would be great if you understand the basics of
Backpropagation to follow this article.

The most important thing about this article is to show you this:

We all know the forward pass of a Convolutional layer


uses Convolutions. But, the backward pass during
Backpropagation also uses Convolutions!
So, let us dig in and start with understanding the intuition behind Backpropagation.
(And for this, we are going to rely on Andrej Karpathy’s amazing CS231n lecture —
https://www.youtube.com/watch?v=i94OvYb6noo).

But if you are already aware of the chain rule in Backpropagation, then you can skip
to the next section.

Understanding Chain Rule in Backpropagation:


Consider this equation

f(x,y,z) = (x + y)z
To make it simpler, let us split it into two equations.

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 2/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Now, let us draw a computational graph for it with values of x, y, z as x = -2, y = 5, z =


4.

Computational Graph of f = q*z where q = x + y

When we solve for the equations, as we move from left to right, (‘the forward pass’),
we get an output of f = -12

Now let us do the backward pass. Say, just like in Backpropagations, we derive the
gradients moving from right to left at each stage. So, at the end, we have to get the

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 3/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

values of the gradients of our inputs x,y and z — ∂f/∂x and ∂f/∂y and ∂f/∂z
(differentiating function f in terms of x,y and z)

Working from right to left, at the multiply gate we can differentiate f to get the
gradients at q and z — ∂f/∂q and ∂f/∂z . And at the add gate, we can differentiate q to
get the gradients at x and y — ∂q/∂x and ∂q/∂y.

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 4/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Calculating gradients and their values in the computational graph

We have to find ∂f/∂x and ∂f/∂y but we only have got the values of ∂q/∂x and ∂q/∂y.
So, how do we go about it?

How do we find ∂f/∂x and ∂f/∂y

This can be done using the chain rule of differentiation. By the chain rule, we can
find ∂f/∂x as

Chain rule of Differentiation

And we can calculate ∂f/∂x and ∂f/∂y as:

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 5/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Backward pass of the Computational graph with all the gradients

Chain Rule in a Convolutional Layer


Now that we have worked through a simple computational graph, we can imagine a
CNN as a massive computational graph. Let us say we have a gate f in that
computational graph with inputs x and y which outputs z.

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 6/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

A simple function f which takes x and y as inputs and outputs z

We can easily compute the local gradients — differentiating z with respect to x and y as
∂z/∂x and ∂z/∂y

For the forward pass, we move across the CNN, moving through its layers and at the
end obtain the loss, using the loss function. And when we start to work the loss
backwards, layer across layer, we get the gradient of the loss from the previous layer
as ∂L/∂z. In order for the loss to be propagated to the other gates, we need to find
∂L/∂x and ∂L/∂y.

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 7/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Local gradients can be computed using the function f. Now, we need to find 𝛛L/𝛛x and 𝛛L/𝛛y, as it needs to
be propagated to other layers.

The chain rule comes to our help. Using the chain rule we can calculate ∂L/∂x and
∂L/∂y, which would feed the other gates in the extended computational graph

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 8/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Finding the loss gradients for x and y

So, what has this got to do with Backpropagation in the Convolutional layer of a
CNN?

Now, lets assume the function f is a convolution between Input X and a Filter F. Input
X is a 3x3 matrix and Filter F is a 2x2 matrix, as shown below:

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 9/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

A simple Convolutional Layer example with Input X and Filter F

Convolution between Input X and Filter F, gives us an output O. This can be


represented as:

Convolution Function between X and F, gives Output O

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 10/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Convolution operation giving us values of the Output O

This gives us the forward pass! Let’s get to the Backward pass. As mentioned earlier,
we get the loss gradient with respect to the Output O from the next layer as ∂L/∂O,
during Backward pass. And combining with our previous knowledge using Chain
rule and Backpropagation we get:

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 11/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Function f during Backward pass

As seen above, we can find the local gradients ∂O/∂X and ∂O/∂F with respect to
Output O. And with loss gradient from previous layers — ∂L/∂O and using chain
rule, we can calculate ∂L/∂X and ∂L/∂F.

Well, but why do we need to find ∂L/∂X and ∂L/∂F?

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 12/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Why do we need to find ∂L/∂X and ∂L/∂F

So let’s find the gradients for X and F — ∂L/∂X and ∂L/∂F


Finding ∂L/∂F
This has two steps as we have done earlier.

Find the local gradient ∂O/∂F

Find ∂L/∂F using chain rule

Step 1: Finding the local gradient — ∂O/∂F:

This means we have to differentiate Output Matrix O with Filter F. From our
convolution operation, we know the values. So let us start differentiating the first
element of O- O¹¹ with respect to the elements of F — F¹¹ , F¹², F²¹ and F²²

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 13/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Step 2: Using the Chain rule:

As described in our previous examples, we need to find ∂L/∂F as:

O and F are matrices. And ∂O/∂F will be a partial derivative of a matrix O with
respect to a matrix F! On top of it we have to use the chain rule. This does look
complicated but thankfully we can use the formula below to expand it.

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 14/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Formula to derive a partial derivative of a matrix with respect to a matrix, using the chain rule

Expanding, we get..

Derivatives of ∂L/∂F

Substituting the values of the local gradient — ∂O/∂F from Equation A, we get

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 15/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Using local gradients values from Equation A

If you closely look at it, this represents an operation we are quite familiar with. We
can represent it as a convolution operation between input X and loss gradient ∂L/
∂O as shown below:

∂L/∂F = Convolution of input matrix X and loss gradient ∂L/∂O

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 16/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

∂L/∂F is nothing but the convolution between Input X


and Loss Gradient from the next layer ∂L/∂O
Finding ∂L/∂X:
Step 1: Finding the local gradient — ∂O/∂X:

Similar to how we found the local gradients earlier, we can find ∂O/∂X as:

Local gradients ∂O/∂X

Step 2: Using the Chain rule:

Expanding this and substituting from Equation B, we get

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 17/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Derivatives of ∂L/∂X using local gradients from Equation

Ok. Now we have the values of ∂L/∂X.

Believe it or not, even this can be represented as a convolution operation.

∂L/∂X can be represented as ‘full’ convolution between a 180-degree rotated Filter


F and loss gradient ∂L/∂O
https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 18/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

First, let us rotate the Filter F by 180 degrees. This is done by flipping it first
vertically and then horizontally.

Flipping Filter F by 180 degrees — flipping it vertically and horizontally

Now, let us do a ‘full’ convolution between this flipped Filter F and ∂L/∂O, which can
be visualized as below: (It is like sliding one matrix over another from right to left,
bottom to top)

Full Convolution operation visualized between 180-degree flipped Filter F and loss gradient ∂L/∂O

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 19/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

The full convolution above generates the values of ∂L/∂X and hence we can
represent ∂L/∂X as

∂L/∂X can be represented as ‘full’ convolution between a 180-degree rotated Filter F and loss gradient
∂L/∂O

Well, now that we have found ∂L/∂X and ∂L/∂F, we can now come to this conclusion

Both the Forward pass and the Backpropagation of a


Convolutional layer are Convolutions
Summing it up:

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 20/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Open in app Sign up Sign in

Search

How to calculate ∂L/∂X and ∂L/∂F

Hope this helped to explain how Backpropagation works in a Convolutional layer of


a CNN.

If you want to read more about it, do look at these links below. And do show some
love by clapping for this article. Adios! :)

Backpropagation In Convolutional Neural Networks


Convolutional neural networks (CNNs) are a biologically-inspired variation of the multilayer
perceptrons (MLPs)…
www.jefkine.com

Back Propagation in Convolutional Neural Networks — Intuition


and Code
Disclaimer: If you don’t have any idea of how back propagation
operates on a computational graph, I recommend you have…
becominghuman.ai

Machine Learning Convolutional Network Backpropagation

Convolution Neural Net Cnn

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 21/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Follow

Written by Pavithra Solai


433 Followers

Head of Data Science, Amphora | Entrepreneur | AI and Product Development | Co-founder of Kint.io, acqui-
hired by Swiggy

More from Pavithra Solai

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 22/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Pavithra Solai in Swiggy Bytes — Tech Blog

Real-time Mask and Gear Compliance Check for Swiggy Delivery


Partners
Using Feature Pyramid Networks (Computer Vision)

May 19, 2021 411

Pavithra Solai

What the Bezier?!


Beziers are one of the seriously mis-understood curve creatures of Math. For the uninitiated,
Beziers with their perplexing formulae look…

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 23/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Aug 26, 2015 58

Pavithra Solai

How to make storybooks in your Mother Tongue?


In recent times, I found myself to be very jealous of people who speak Indian languages like
Tamil and Gujarati. The reason: There are…

Feb 17, 2017 155 3

Pavithra Solai

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 24/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Kint — A Kinetic Typography app


Kint creates animated text videos. Weaving together voice and words, Kint emotes text and
enlivens your thoughts as a video.

Nov 20, 2014 167 1

See all from Pavithra Solai

Recommended from Medium

Cristian Leo in Towards Data Science

The Math Behind Convolutional Neural Networks


Dive into CNN, the backbone of Computer Vision, understand its mathematics, implement it
from scratch, and explore its applications

Apr 9 879 2

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 25/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Jo Wang

Deep Learning Part 2 — Neural Network and the critical Activation


Functions
Neural Network Structure

Jun 29 1

Lists

Predictive Modeling w/ Python


20 stories · 1633 saves

Practical Guides to Machine Learning


10 stories · 1999 saves

Natural Language Processing


1792 stories · 1400 saves

The New Chatbots: ChatGPT, Bard, and Beyond


12 stories · 495 saves

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 26/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

RandomResearchAI

Backpropagation, high school student edition


Introduction

Jun 24

Ashish Pratap Singh in AlgoMaster.io

I FAILED 30+ Coding Interviews Until I Learned THIS


Solving 500+ LeetCode problems doesn’t guarantee that you can pass any coding interview.

Oct 26 833 13

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 27/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Jorgecardete in The Deep Hub

Convolutional Neural Networks: A Comprehensive Guide


Exploring the power of CNNs in image analysis

Feb 7 2.5K 38

Mikel

Pinterest ML Internship Summer 2025


Looking for a tech internship for the Summer 2025. I share my recent Interview experience,
Questions, Solutions and tips.
https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 28/29
05/11/2024, 16:07 How does Backpropagation work in a CNN? | Medium

Oct 27 3 1

See more recommendations

https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c 29/29

You might also like