0% found this document useful (0 votes)
14 views10 pages

Stochastic Gradient Descent With LocalMinima

Uploaded by

happy225020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views10 pages

Stochastic Gradient Descent With LocalMinima

Uploaded by

happy225020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Stochastic Gradient Descent

(SGD)

Jaskaran Singh
12591001
[Link](C.S.E)1 Sem.
st
Introduction to Gradient Descent
• • Gradient Descent is an optimization
algorithm used to minimize the cost function
in machine learning.
• • It works by iteratively adjusting parameters
in the opposite direction of the gradient of the
cost function.
• • Used in training linear regression, logistic
regression, neural networks, and more.
What is Stochastic Gradient Descent (SGD)?

• • Instead of using the entire dataset, SGD


updates parameters using only one sample at
a time.
• • This makes updates faster and introduces
randomness, helping to escape local minima.
• • Often used in training large-scale machine
learning models and deep learning networks.
Why Escape Local Minima?
• • Local minimum = small valley (not the best
solution).
• • Global minimum = deepest valley (lowest
error, best solution).
• • If stuck in a local minimum → model accuracy
is not the best.
• • SGD’s randomness acts like a “shake,” helping
the model escape small valleys and move closer
to the best valley.
How SGD Works?
• 1. Initialize model parameters.
• 2. Select a random sample from the training
data.
• 3. Compute the gradient of the loss function
for that sample.
• 4. Update parameters using the gradient.
• 5. Repeat until convergence.
Visual Representation
SGD vs Gradient Descent: Comparison
Advantages & Disadvantages of SGD
• Advantages:
• • Faster updates with large datasets.
• • Helps escape local minima.
• • Suitable for real-time/online learning.

• Disadvantages:
• • Noisy convergence.
• • Requires careful tuning of learning rate.
• • May oscillate around the minimum.
Applications of SGD
• • Training deep learning models (CNNs, RNNs,
Transformers).
• • Online recommendation systems.
• • Natural Language Processing (NLP).
• • Large-scale optimization problems.

You might also like