0% found this document useful (0 votes)
43 views9 pages

Non-Convex Optimization For Deep Networks and Stochastic

The document discusses non-convex optimization in deep learning, particularly focusing on Stochastic Gradient Descent (SGD) as a method for optimizing machine learning models. It highlights a case study where a deep learning model suffers from inefficiencies and overfitting due to lack of optimization, achieving only 70% accuracy. The document outlines the SGD algorithm, emphasizing its computational efficiency and the process of updating model parameters using random training examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views9 pages

Non-Convex Optimization For Deep Networks and Stochastic

The document discusses non-convex optimization in deep learning, particularly focusing on Stochastic Gradient Descent (SGD) as a method for optimizing machine learning models. It highlights a case study where a deep learning model suffers from inefficiencies and overfitting due to lack of optimization, achieving only 70% accuracy. The document outlines the SGD algorithm, emphasizing its computational efficiency and the process of updating model parameters using random training examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SNS COLLEGE OF ENGINEERING

Kurumbapalayam(Po), Coimbatore – 641 107


Accredited by NAAC-UGC with ‘A’ Grade
Approved by AICTE, Recognized by UGC & Affiliated to Anna University, Chennai

Department of AI &DS

Course Name – 19AD602 DEEP LEARNING

III Year / VI Semester

UNIT-4 OPTIMIZATION AND GENERALIZATION


Topic: Non-convex optimization for deep networks AND Stochastic
Gradient Descent

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks AND
Stochastic Gradient Descent/SNSCE
NON CONVEX OPTIMIZATION IN DEEP LEARNING

CASE STUDY:
A company trains a deep learning model for image recognition without optimizations. The model takes 10
hours to train, achieves only 70% accuracy, and suffers from overfitting, leading to poor generalization on
new data.

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks AND
Stochastic Gradient Descent/SNSCE 1/8
NON CONVEX OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks AND
Stochastic Gradient Descent/SNSCE 2/8
NON CONVEX OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks AND
Stochastic Gradient Descent/SNSCE 3/8
NON CONVEX OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks AND
Stochastic Gradient Descent/SNSCE 4/8
NON CONVEX OPTIMIZATION IN DEEP LEARNING

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks AND
Stochastic Gradient Descent/SNSCE 5/8
STOCHASTIC GRADIENT DESCENT IN DEEP LEARNING

Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm that is used for optimizing machine learning
models. It addresses the computational inefficiency of traditional Gradient Descent methods when dealing with large datasets in
machine learning projects.
In SGD, instead of using the entire dataset for each iteration, only a single random training example (or a small batch) is selected to
calculate the gradient and update the model parameters. This random selection introduces randomness into the optimization
process, hence the term “stochastic” in stochastic Gradient Descent
The advantage of using SGD is its computational efficiency, especially when dealing with large datasets. By using a single example
or a small batch, the computational cost per iteration is significantly reduced compared to traditional Gradient Descent methods
that require processing the entire dataset.

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks and Stochastic Gradient
Descent/SNSCE 6/8
STOCHASTIC GRADIENT DESCENT IN DEEP LEARNING

Stochastic Gradient Descent Algorithm


● Initialization: Randomly initialize the parameters of the model.
● Set Parameters: Determine the number of iterations and the learning rate (alpha) for updating the parameters.
● Stochastic Gradient Descent Loop: Repeat the following steps until the model converges or reaches the maximum number of iterations:
○ Shuffle the training dataset to introduce randomness.
○ Iterate over each training example (or a small batch) in the shuffled order.
○ Compute the gradient of the cost function with respect to the model parameters using the current training
example (or batch).
○ Update the model parameters by taking a step in the direction of the negative gradient, scaled by the learning rate.
○ Evaluate the convergence criteria, such as the difference in the cost function between iterations of the gradient.
● Return Optimized Parameters: Once the convergence criteria are met or the maximum number of iterations is reached, return the optimized
model parameters.

In SGD, since only one sample from the dataset is chosen at random for each iteration, the path taken by the algorithm to reach the minima is usually
noisier than your typical Gradient Descent algorithm. But that doesn’t matter all that much because the path taken by the algorithm does not matter,
as long as we reach the minimum and with a significantly shorter training time.

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks and Stochastic Gradient
Descent/SNSCE 7/8
STOCHASTIC GRADIENT DESCENT IN DEEP LEARNING

thank you

GULSHAN BANU.A/ AP/AI AND DS /Non-convex optimization for deep networks and Stochastic Gradient
Descent/SNSCE 8/8

You might also like