0% found this document useful (0 votes)

25 views9 pages

Gradient Descent

gradient descent explained

Uploaded by

nttan23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views9 pages

Gradient Descent

gradient descent explained

Uploaded by

nttan23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Gradient Descent

Gradient descent is a widely used algorithm in machine learning, playing a key role in
tasks ranging from optimizing linear regression models to training complex neural
networks. Its primary objective is to minimize a given function, regardless of the number
of parameters involved.

https://upload.wikimedia.org/wikipedia/commons/trans
coded/4/4c/Gradient_Descent_in_2D.webm/Gradient_
Descent_in_2D.webm.720p.vp9.webm

Gradient Descent in 2D:

Definning Model
Understanding Gradient Descent
Gradient descent is a systematic method to find optimal
values for parameters (w and b) that minimize the cost
function (j).

It is applicable not only to linear regression but also to

more complex models, including deep learning

Source: Khan Academy

Process of Gradient Descent

Start with initial guesses for parameters, often set to 0(= 0) .

Iteratively adjust parameters in the direction of steepest descent to reduce the cost function until reaching
a minimum.

Let’s outline on a high level, how this algorithm works:

Start with some parameters w, b

Computes gradient using a single Training example.

Keep changing the values for w, bto reduce the cost function J(w, b).

Continue until we settle at or near a minimum. Note, some functions may have more than 1 minimum.

Gradient Descent 1
💡 Example: an analogy is used to explain gradient descent through a hilly landscape

Imaginary Hill: You stand on a hill, with high points as hills and low points as valleys.

Finding the Path: You look around to find the steepest descent and take a small step in that
direction.

Iterative Process: Repeat this process until you reach the valley (local minimum).

This example illustrates how gradient descent works by continuously adjusting your position based
on the steepest descent until the minimum cost is found.

Local Minima in Gradient Descent

Different starting points can lead to different local
minima, meaning the algorithm may converge to various
solutions based on initial values.

This property highlights the importance of the starting

point in the optimization process.

Local Minima

Defining Algorithm
The gradient descent algorithm is defined as a repeated convergence for each input parameter,
∂
w = w − α ∂w J(w, b)

∂
b = b − α ∂b J(w, b)

α= Learning rate, which controls the size of steps taken during optimization. Smaller values (between 0and
1) lead to more gradual updates, while larger values result in bigger steps.
∂
∂w
J(w, b)= Derivative of the cost function, which determines the direction to take each step. It is also

calculated as the slope of the graph at a particular point.

Gradient Descent 2
💡 NOTE: Simultaneous Updates

Both w and bshould be updated simultaneously to ensure accurate calculations.

The correct implementation involves calculating updates for both parameters before applying
them, avoiding errors that can arise from non-simultaneous updates.

Gradient Descent Intuition

Understanding the Derivative
The derivative indicates the slope of the cost
function at a given point, guiding the direction of
the parameter update.

A positive slope results in decreasing the

parameter w, while a negative slope increases w,
both aiming to minimize the cost function.

positive slope increases w

negative slope increases w

Gradient Descent 3
💡 Example:

∂J(w,b)
The left plot shows the slope of the cost curve ∂w

at three points. It’s positive on the right and negative

on the left, so gradient descent always moves toward
the bottom where the gradient is zero.

The left plot fixes b = 100, showing how gradient

∂J(w,b) ∂J(w,b)
descent uses both ∂w and ∂b . The right quiver

plot visualizes their combined gradients—arrow size

shows magnitude, direction shows ratio. Gradients
point away from the minimum, but since gradient
descent subtracts the gradient, it moves parameters
toward lower cost.

❓ Why the Gradient point away from the minimum?

Example
source: TowardsDataScience

Consider that you are walking along with the graph

below, and you are currently at the ‘green‘ dot. You
aim to reach the minimum, i.e., the ‘red’ dot, but from
your position, you are unable to view it.
Possible actions would be:

You might go upward or downward

If you decide which way to go, you might take a

bigger step or a little step to reach your
destination.

Essentially, there are two things that you should know to reach the minima, i.e. which
way to go and how big a step to take.

Gradient Descent 4
⇒ Gradient Descent Algorithm: helps us make efficient decisions using derivatives, which represent the
slope of a graph at a point. This slope, shown by a tangent line, guides us toward the minimum.

The Minimum Value and Steep Slope

A tangent at the green point shows we’re moving
away from the minimum if going upward. Its steep
slope means larger steps, while the gentler slope at
the blue point means smaller steps toward the
minimum.

Choosing Learning Rate

The learning rate (α) significantly affects the efficiency of gradient descent.

Too small α:
Leads to very slow convergence.

Requires many steps, increasing computation

time.

Too large α:
Can overshoot the minimum.

May cause the algorithm to diverge away from

the minimum (fail to converge).

The challenge increases with complex graphs:

Multiple local minima and maxima make optimization harder.

Choosing the right α is crucial for stable and effective convergence.

Gradient Descent 5
Can reach local minimum with fixed Learning Rate α
Near a local minimum,

Derivative automatically gets smaller.

Update steps become smaller.

Optimizations
source: GitHub

How can we check gradient descent is working correctly?

source: Vinija_Notes

We can have 2 ways to achieve this. We can plot the cost function J , which is calculated on the training
set, and plot the value of J at each iteration (aka each simultaneous update of parameters w, b) of
gradient descent.

We can also use an Automatic convergence test. We choose an ϵto be a very small number. If the cost J
decreases by less than ϵon one iteration, then you’re likely on this flattened part of the curve, and you
can declare convergence.

Learning curve

Gradient Descent 6
Gradient descent aims to minimize the cost function
J(w, b). Plotting J over iterations gives the learning
curve, which ideally should drop quickly and
converge to 0. If J increases at any point, it may
indicate a poor learning rate αor a bug. The required
number of iterations varies by application, making it
hard to predict convergence time.

Automatic convergence test

An automatic convergence test can stop training when the drop in cost J(w, b) between iterations is by <= ϵ.
Selecting the right ϵcan be challenging, so it’s best used together with a learning curve.

Debugging
Although a cost function that fails to decrease or fluctuates is often due to a high learning rate, it may also
indicate a coding bug. Using a very small learning rate can help reveal bugs, as the algorithm should still
behave unexpectedly if an issue exists.

Gradient Descent for Linear Regression

Linear regression model Cost function
fw,b (x) = wx + b
(The squared error cost function)
1 m
J(w, b) = 2m
∑i=1 (fw,b (xi ) − y i )2

Pre-derived gradient descent algorithm

repeat until convergence:
∂
w = w − α ∂w J(w, b)

∂
b = b − α ∂b J(w, b)

We have:
∂ 1 m
∂w
J(w, b)
⇒ m
∑i=1 (fw,b (xi ) − y i )xi

∂ 1 m
∂b
J(w, b) ⇒ m
∑i=1 (fw,b (xi ) − y i )

Gradient Descent 7
Final gradient descent algorithm
w = w − α m1 ∑i=1 (fw,b (xi ) − y i )xi
m

m
b = b − α m1 ∑i=1 (fw,b (xi ) − y i )

💡 Convex function is cost function of linear regression

The squared error cost function is convex,

ensuring that gradient descent will always
converge to a global minimum, avoiding
local minima issues.

Batch Gradient Descent

Batch gradient descent uses the entire training
dataset to compute updates at each step,
ensuring a comprehensive approach to
optimization.

This method contrasts with other versions of

gradient descent that utilize smaller subsets of
data.

https://statusneo.com/efficientdl-mini-batch-gradient-
descent-explained/ 23/05/2025

When we try to find the minimum in the contour plot: The graph is like following:

Gradient Descent 8
Batch gradient descent uses the entire training dataset to compute updates at each step, ensuring a
comprehensive approach to optimization.

This method contrasts with other versions of gradient descent that utilize smaller subsets of data.

That means, if the number of training example is 47then m = 47:

m
1 2
J(w, b) = ∑ (fw,b (x(i) ) − y (i) )
2m

i=1

Example:
The following, the left graph shows w 's progression over the first few steps of gradient descent. w oscillates
from positive to negative and cost grows rapidly. Gradient Descent is operating on both $w$ and $b$
simultaneously, so one needs the 3-D plot on the right for the complete picture.

source: w1lab4

Gradient Descent 9

Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
15 pages
Gradient Descent Explained
No ratings yet
Gradient Descent Explained
9 pages
Understanding Cost Function & Gradient Descent
No ratings yet
Understanding Cost Function & Gradient Descent
142 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
9 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
ML Lecture # 03 Gradient Descent
No ratings yet
ML Lecture # 03 Gradient Descent
23 pages
What Is Gradient Descent - Built in
No ratings yet
What Is Gradient Descent - Built in
11 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
3 OptTechniques GD Part1
No ratings yet
3 OptTechniques GD Part1
54 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
20 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
62 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
8 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
15 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
3 pages
MAT6007 - Session8 - Gradient Descent
No ratings yet
MAT6007 - Session8 - Gradient Descent
16 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Unit 1.4
No ratings yet
Unit 1.4
22 pages
04gradient Descent
No ratings yet
04gradient Descent
21 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
16 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
8 pages
Gradient Descent in Logistic Regression
No ratings yet
Gradient Descent in Logistic Regression
16 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Understanding Gradient Descent Algorithm
No ratings yet
Understanding Gradient Descent Algorithm
5 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Gradient Descent From Scratch Complete Intuition
No ratings yet
Gradient Descent From Scratch Complete Intuition
8 pages
Understanding Gradient Descent Algorithm
No ratings yet
Understanding Gradient Descent Algorithm
64 pages
Regression
No ratings yet
Regression
71 pages
11-Descida de Gradiente
No ratings yet
11-Descida de Gradiente
3 pages
Neural Network Optimization Techniques
No ratings yet
Neural Network Optimization Techniques
65 pages
Understanding Gradient Descent Basics
No ratings yet
Understanding Gradient Descent Basics
14 pages
Understanding Gradient Descent Techniques
No ratings yet
Understanding Gradient Descent Techniques
40 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
4 pages
LInear
No ratings yet
LInear
14 pages
Gradient Descent: Rohit Sharma Pushpendra Kumar Sharma
No ratings yet
Gradient Descent: Rohit Sharma Pushpendra Kumar Sharma
12 pages
Linear Regression - Gradient Descent Method
No ratings yet
Linear Regression - Gradient Descent Method
15 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Gradient Descent & Linear Regression Guide
No ratings yet
Gradient Descent & Linear Regression Guide
3 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
65 pages
Gradient Descent
No ratings yet
Gradient Descent
55 pages
Gradient Descent in Machine Learning - Javatpoint
No ratings yet
Gradient Descent in Machine Learning - Javatpoint
9 pages
Gradient Descent New
No ratings yet
Gradient Descent New
42 pages
Understanding Gradient Descent Methods
No ratings yet
Understanding Gradient Descent Methods
2 pages
Assignment 4
No ratings yet
Assignment 4
8 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
27 pages
Gradient Descent Algorithm.Y...
No ratings yet
Gradient Descent Algorithm.Y...
10 pages
Gradient Descent Explained
No ratings yet
Gradient Descent Explained
8 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Understanding Gradient Descent Basics
No ratings yet
Understanding Gradient Descent Basics
10 pages
AG100D Rev-D
No ratings yet
AG100D Rev-D
2 pages
Excel1-Module 3 Lesson
No ratings yet
Excel1-Module 3 Lesson
14 pages
User Exit in Sales Document Processing
No ratings yet
User Exit in Sales Document Processing
7 pages
Design Guide Meeting Room SG
No ratings yet
Design Guide Meeting Room SG
3 pages
Teknologi Informasi, Internet Dan Pengguna
No ratings yet
Teknologi Informasi, Internet Dan Pengguna
30 pages
IT-402 Sample Paper V Answer Key
No ratings yet
IT-402 Sample Paper V Answer Key
9 pages
1.1 Intro Basics of Data Science
No ratings yet
1.1 Intro Basics of Data Science
36 pages
Remote Controlled Security Door 2332 0796 1000184
No ratings yet
Remote Controlled Security Door 2332 0796 1000184
4 pages
PVL1501 Law of Persons Overview 2024
No ratings yet
PVL1501 Law of Persons Overview 2024
17 pages
MoodMingle App Development Phases
No ratings yet
MoodMingle App Development Phases
7 pages
SL833 32374246 English
No ratings yet
SL833 32374246 English
80 pages
A3 MultiChoice
100% (1)
A3 MultiChoice
2 pages
Advanced Java Programming Assignment 2024
No ratings yet
Advanced Java Programming Assignment 2024
4 pages
CS Ans
No ratings yet
CS Ans
48 pages
Digital-to-Analog Conversion Explained
No ratings yet
Digital-to-Analog Conversion Explained
6 pages
The Cultural Life of Machine Learning: An Incursion Into Critical AI Studies Jonathan Roberge Ebook Long Edition Unlock
100% (1)
The Cultural Life of Machine Learning: An Incursion Into Critical AI Studies Jonathan Roberge Ebook Long Edition Unlock
305 pages
Pwrnasalahan (06) HTML
No ratings yet
Pwrnasalahan (06) HTML
4 pages
DLSU (ACCCOB3 MANAGERIAL ACCOUNTING 17E BY GARRISON) MHE Connect Via IBC Student Registration Payment Instruction
No ratings yet
DLSU (ACCCOB3 MANAGERIAL ACCOUNTING 17E BY GARRISON) MHE Connect Via IBC Student Registration Payment Instruction
22 pages
Train Simulation Log Analysis
No ratings yet
Train Simulation Log Analysis
6 pages
Introduction to Data Structures Guide
No ratings yet
Introduction to Data Structures Guide
25 pages
Telegram Desktop Installation Log
No ratings yet
Telegram Desktop Installation Log
4 pages
Final Year Loan System Project
No ratings yet
Final Year Loan System Project
5 pages
Star FBV 500
No ratings yet
Star FBV 500
65 pages
RPA Design Flowchart Overview
No ratings yet
RPA Design Flowchart Overview
21 pages
PDF Scrum Org Professional Agile Leader PDF
No ratings yet
PDF Scrum Org Professional Agile Leader PDF
19 pages
Course Outline ECON2236 - 2021
No ratings yet
Course Outline ECON2236 - 2021
4 pages
Decora Smart Wi-Fi Brochure
No ratings yet
Decora Smart Wi-Fi Brochure
20 pages
3150703 Algorithm Question Bank
No ratings yet
3150703 Algorithm Question Bank
15 pages
Pcs Module 3 Notes
No ratings yet
Pcs Module 3 Notes
44 pages
DBCA Silent
No ratings yet
DBCA Silent
33 pages

Gradient Descent

Uploaded by

Gradient Descent

Uploaded by

Gradient Descent

Gradient Descent in 2D:

It is applicable not only to linear regression but also to

Source: Khan Academy

Process of Gradient Descent

Let’s outline on a high level, how this algorithm works:

Start with some parameters w, b﻿

Computes gradient using a single Training example.

Local Minima in Gradient Descent

This property highlights the importance of the starting

calculated as the slope of the graph at a particular point.

Both w ﻿and b﻿should be updated simultaneously to ensure accurate calculations.

Gradient Descent Intuition

A positive slope results in decreasing the

positive slope increases w

at three points. It’s positive on the right and negative

The left plot fixes b = 100﻿, showing how gradient

plot visualizes their combined gradients—arrow size

❓ Why the Gradient point away from the minimum?

Consider that you are walking along with the graph

You might go upward or downward

If you decide which way to go, you might take a

The Minimum Value and Steep Slope

Choosing Learning Rate

Requires many steps, increasing computation

May cause the algorithm to diverge away from

The challenge increases with complex graphs:

Multiple local minima and maxima make optimization harder.

Choosing the right α is crucial for stable and effective convergence.

Derivative automatically gets smaller.

Update steps become smaller.

How can we check gradient descent is working correctly?

Automatic convergence test

Gradient Descent for Linear Regression

Pre-derived gradient descent algorithm

💡 Convex function is cost function of linear regression

The squared error cost function is convex,

Batch Gradient Descent

This method contrasts with other versions of

That means, if the number of training example is 47﻿then m = 47﻿:

You might also like

Start with some parameters w, b

Both w and bshould be updated simultaneously to ensure accurate calculations.

The left plot fixes b = 100, showing how gradient

That means, if the number of training example is 47then m = 47: