Module2.3 Hyperparameter Optimization

Uploaded by

sudothearkknight

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

15 views29 pages

Module2.3 Hyperparameter Optimization

Uploaded by

sudothearkknight

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

You are on page 1/ 29

Course Code: CSA3002

MACHINE LEARNING ALGORITHMS

Course Type: LPC – 2-2-3

Course Objectives
• The objective of the course is to familiarize the learners with
the concepts of Machine Learning Algorithms and attain
Skill Development through Experiential Learning
techniques.
Course Outcomes
At the end of the course, students should be able to
1. Understanding of training and testing the datasets using machine
Learning techniques.
2. Apply optimization and parameter tuning techniques for machine
Learning algorithms.
3. Apply a machine learning model to solve various problems using
machine learning algorithms.
4. Apply machine learning algorithm to create models.
Difference between Parameter and Hyper-
parameter?
• Model Parameters:
• Model parameters are configuration variables that are internal to the model,
and a model learns them on its own. For example, W Weights or Coefficients
of independent variables in the Linear regression model. or Weights or
Coefficients of independent variables in SVM, weight, and biases of a
neural network, cluster centroid in clustering. Some key points for model
parameters are as follows:
• They are used by the model for making predictions.
• They are learned by the model from the data itself
• These are usually not set manually.
• These are the part of the model and key to a machine learning Algorithm.
• Model Hyperparameters:
• Hyperparameters are those parameters that are explicitly defined by
the user to control the learning process. Some key points for model
parameters are as follows:
• These are usually defined manually by the machine learning engineer.
• One cannot know the exact best value for hyperparameters for the
given problem. The best value can be determined either by the rule of
thumb or by trial and error.
• Some examples of Hyperparameters are the learning rate for
training a neural network, K in the KNN algorithm,
Hyper parameter
• Hyperparameters are external configuration settings or parameters that
are set prior to the training process of a machine learning model.
• They are not learned from the data but play a crucial role in
controlling various aspects of the model's behavior and performance.
• These hyperparameters govern the learning process, model
architecture, and optimization procedure, among other aspects, and are
typically determined by the machine learning engineer or researcher.
• In contrast to hyperparameters, the model's parameters are the internal
variables that the algorithm learns from the training data during the
training process.
• These parameters are adjusted iteratively to minimize the model's loss
function and make predictions that fit the data.
• Hyperparameters can include values such as the learning rate, batch
size, number of hidden layers in a neural network, dropout rate,
regularization strength, and many others.
• Properly selecting and tuning hyperparameters is a crucial step in the
machine learning pipeline, as they can significantly impact the model's
performance, convergence speed, and ability to generalize to new,
unseen data.
• Hyperparameter tuning is often done through techniques like grid
search, random search, or more advanced methods like Bayesian
optimization.
Common examples of hyperparameters in
machine learning:
• Learning Rate: In many gradient-based optimization algorithms, like
stochastic gradient descent (SGD), the learning rate is a hyperparameter that
determines the step size at which the model's parameters are updated during
training. It influences how quickly or slowly the model converges to an
optimal solution.
• Number of Hidden Layers and Units: In neural networks, you can define
hyperparameters like the number of hidden layers and the number of units
(neurons) in each layer. These choices affect the model's capacity and
complexity.
• Activation Functions: The choice of activation functions, such as ReLU
(Rectified Linear Unit), Sigmoid, or Tanh, is a hyperparameter that influences
the non-linearity of the model and its ability to capture complex patterns.
• Batch Size: This hyperparameter determines the number of training
examples used in each iteration of training. It can affect the speed and
stability of training.
• Epochs: The number of epochs represents the number of times the entire
training dataset is passed through the model during training. It's a
hyperparameter that controls how many times the model updates its
parameters.
• Regularization Strength: Regularization hyperparameters, like L1 and
L2 regularization coefficients, help control overfitting by adding
penalties to the model's loss function based on the magnitudes of the
model parameters.
• Dropout Rate: Dropout is a regularization technique that randomly drops a fraction
of neurons during each training iteration. The dropout rate is a hyperparameter
controlling the dropout probability.
• Optimization Algorithm: The choice of optimization algorithm, such as SGD
(Stochastic gradient descent), Adam, or RMSprop, is a hyperparameter that
determines how the model's weights are updated during training.
• Loss Function: Different tasks and models require different loss functions (e.g.,
mean squared error for regression or cross-entropy for classification). Choosing the
appropriate loss function is a hyperparameter.
• Learning Rate Schedule: Some models benefit from learning rate schedules that
change the learning rate during training. These schedules can be defined using
hyperparameters like initial learning rate and decay rates.
• Hyperparameter tuning is the process of finding the optimal set of
hyperparameters for a given machine learning task.
• This is often done through techniques like grid search, random search,
or more advanced methods like Bayesian optimization to improve a
model's performance on a specific task.
• Proper hyperparameter tuning can significantly impact a model's
effectiveness and generalization.
Hyperparameter Optimization Techniques
• Hyperparameter optimization is a critical step in the machine learning
workflow as it helps find the best set of hyperparameters for a given
model and task.
Bayesian hyper parameter Optimization
technique
• Bayesian Hyperparameter Optimization is a method for efficiently and
systematically tuning hyperparameters of machine learning models
using Bayesian optimization techniques.
• Bayesian optimization uses probabilistic modeling to shape a prior
distribution of objective functions based on previously evaluated
parameter sets. It leverages these models to suggest new
configurations maximizing expected improvement iteratively and
intelligently in performance metrics.
Simple real time example:
• Imagine you're trying to bake the perfect cake, but you're not sure
about the right amount of sugar to use. You know it could be
anywhere between 1/2 cup and 2 cups. This amount of sugar is like a
"hyperparameter" for your cake.
• Now, instead of randomly guessing how much sugar to use or trying
every possible amount, you decide to be smarter. You start with an
initial guess, say 1 cup of sugar. You bake a cake with that amount and
taste it to see how good it is. This "tasting" part is like checking how
well your machine learning model works with a particular set of
hyperparameters.
Here's where Bayesian optimization
comes in:
• Learning from Experience: After tasting the first cake (trying the first
set of hyperparameters), you make a note of how good or bad it was.
You now have some information about how the amount of sugar
affects the cake's taste.
• Adjusting Your Guess: Instead of randomly guessing the next amount
of sugar, Bayesian optimization uses math to make a smarter guess. It
looks at your previous cake tastings (model performances) and tries
to predict the best amount of sugar to use next. This is like learning
from your past mistakes and successes.
• Repeat and Improve: You bake another cake with the suggested
amount of sugar, taste it, and update your notes. Then, the
optimization process repeats. It keeps suggesting new amounts of
sugar to use based on what it has learned so far.
• Finding the Best Recipe: Over time, Bayesian optimization gets better
at guessing the right amount of sugar. It keeps refining its guess until
it finds the perfect recipe that makes the tastiest cake.
• In the world of machine learning, this process of smartly guessing
hyperparameters and learning from past experiments helps us find
the best settings for our models without trying every possible
combination. It's like finding the perfect recipe for training your
computer programs. This makes it more efficient and saves a lot of
time when you're trying to make your models perform their best.
Theoretical Concept
• 1. Objective Function:
• In Bayesian Hyperparameter Optimization, the objective function is the
performance metric (e.g., accuracy, loss, F1-score) of your machine learning model
on a validation set.
• This function takes hyperparameters as inputs and produces a scalar output that
quantifies the model's performance.
• 2. Probabilistic Surrogate Model:
• The Bayesian optimization process starts by constructing a probabilistic surrogate
model (usually a Gaussian process or Bayesian neural network) that approximates
the objective function.
• Initially, the surrogate model assumes uncertainty about the objective function's
behavior across the hyperparameter space.
• 3. Acquisition Function:
• An acquisition function guides the search for the optimal
hyperparameters by balancing exploration and exploitation.
• Common acquisition functions include Expected Improvement (EI),
Probability of Improvement (PI), and Upper Confidence Bound (UCB).
• 4. Initial Random Sampling:
• A few initial sets of hyperparameters are randomly sampled from the
hyperparameter space to initiate the optimization process.
• These initial evaluations are used to train the surrogate model.
• 5. Iterative Optimization:
• Bayesian Hyperparameter Optimization is an iterative process.
• At each iteration, the acquisition function is optimized to suggest the next set
of hyperparameters to evaluate.
• The selected hyperparameters are used to query the true objective function,
and the corresponding performance metric is obtained.
• 6. Surrogate Model Update:
• After each evaluation, the surrogate model is updated based on the new data
point (hyperparameters and performance metric).
• The surrogate model becomes more accurate in regions of the hyperparameter
space where data points have been observed.
• 7. Acquisition Function Optimization:
• The acquisition function is optimized to determine the next set of
hyperparameters to evaluate in the following iteration.
• It usually seeks to maximize the expected improvement in the
objective function over the current best observed value.
• 8. Stopping Criterion:
• Bayesian Hyperparameter Optimization continues iterating until a
stopping criterion is met, such as reaching a maximum number of
iterations or finding hyperparameters with satisfactory performance.
• The key idea behind Bayesian Hyperparameter Optimization is that it
uses the surrogate model's predictions to intelligently select
hyperparameters for evaluation, focusing on promising regions of the
hyperparameter space.
• This strategy efficiently explores the hyperparameter space while
minimizing the number of costly objective function evaluations.
• Overall, Bayesian Hyperparameter Optimization is a powerful technique
for automating the process of finding the best hyperparameters for
machine learning models, making it especially valuable when the search
space is large or when hyperparameter tuning can be computationally
expensive.
Implementation in Python
• Implementing Bayesian hyperparameter optimization from scratch
can be quite complex. Fortunately, there are libraries available that
make it much easier to perform Bayesian optimization for
hyperparameter tuning. One such popular library is Optuna. Here's
how you can use Optuna to perform Bayesian hyperparameter
optimization in Python:
• First, you'll need to install the Optuna library if you haven't already:

• pip install optuna pandas scikit-learn

import optuna
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the diabetes dataset (you may need to adjust the path to your dataset).
diabetes_data = pd.read_csv('diabetes.csv')

# Split the data into features (X) and target (y).

X = diabetes_data.drop(columns=['Outcome'])
y = diabetes_data['Outcome']
# Define an objective function for hyperparameter optimization.
def objective(trial):
# Define the hyperparameters to optimize.
max_depth = trial.suggest_int('max_depth', 2, 32)
min_samples_split = trial.suggest_uniform('min_samples_split', 0.1, 1.0)
min_samples_leaf = trial.suggest_uniform('min_samples_leaf', 0.1, 0.5)

# Create and train a decision tree classifier with the suggested hyperparameters.
model = DecisionTreeClassifier(
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
random_state=42 # for reproducibility
)
# Split the data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model.

model.fit(X_train, y_train)

# Make predictions on the test set.

y_pred = model.predict(X_test)

# Calculate accuracy as the objective to be minimized.

accuracy = accuracy_score(y_test, y_pred)

return accuracy
if __name__ == "__main__":
# Create a study object to manage the optimization process.
study = optuna.create_study(direction='maximize') # We want to maximize accuracy.

# Optimize the objective function.

study.optimize(objective, n_trials=100) # You can adjust the number of trials.

# Print the best hyperparameters and their corresponding accuracy.

best_params = study.best_params
best_accuracy = study.best_value
print("Best Hyperparameters: ", best_params)
print("Best Accuracy: ", best_accuracy)
• In this code:
• We load the diabetes.csv dataset, assuming it contains features (X) and the target
(Outcome).
• The objective function defines hyperparameters (max_depth, min_samples_split, and
min_samples_leaf) for the Decision Tree classifier that we want to optimize using
Optuna.
• Inside the objective function, we create and train a Decision Tree classifier with the
suggested hyperparameters. We then calculate the accuracy of the model on a test
set.
• The study object is created to manage the optimization process, and we call
study.optimize to perform the optimization.
• Finally, we print the best hyperparameters and their corresponding accuracy score.

Mechanics of Materials Laboratory Course
100% (1)
Mechanics of Materials Laboratory Course
228 pages
Download ebooks file Deep Learning with R 1st Edition François Chollet all chapters
100% (2)
Download ebooks file Deep Learning with R 1st Edition François Chollet all chapters
49 pages
Calibration Sticker Template (New Update)
No ratings yet
Calibration Sticker Template (New Update)
1 page
New Text Document
77% (13)
New Text Document
4 pages
Lecture 16 - Hyperparameter Tuning
No ratings yet
Lecture 16 - Hyperparameter Tuning
10 pages
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
No ratings yet
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
81 pages
Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
186 pages
UNIT-I_Introduction to Computer Vision
No ratings yet
UNIT-I_Introduction to Computer Vision
45 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Hyper-Parameter Tuning Techniques in Deep Learning - Towards Data Science
No ratings yet
Hyper-Parameter Tuning Techniques in Deep Learning - Towards Data Science
14 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
10 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
AI-Lecture 12 - Simple Perceptron
100% (1)
AI-Lecture 12 - Simple Perceptron
24 pages
Introduction To Generative Adversarial Networks: Luke de Oliveira
No ratings yet
Introduction To Generative Adversarial Networks: Luke de Oliveira
31 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
Neural
No ratings yet
Neural
35 pages
Session 03 - Neural Networks
No ratings yet
Session 03 - Neural Networks
21 pages
Data Literacy Questions All Types
No ratings yet
Data Literacy Questions All Types
2 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Soft Max
No ratings yet
Soft Max
6 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Deep Learning - DL-2
100% (1)
Deep Learning - DL-2
44 pages
Guide Convolutional Neural Network CNN
100% (1)
Guide Convolutional Neural Network CNN
25 pages
2.neural Network
No ratings yet
2.neural Network
19 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
41 pages
Concepts (PPT) - Data Preprocessing
No ratings yet
Concepts (PPT) - Data Preprocessing
19 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Chapter-2-Fundamentals of Machine Learning
No ratings yet
Chapter-2-Fundamentals of Machine Learning
23 pages
An Introduction To Deep Learning On Remote Sensing Images (Tutorial) - Mdl4eo
100% (1)
An Introduction To Deep Learning On Remote Sensing Images (Tutorial) - Mdl4eo
13 pages
Lecture 2 Prompt Engineering
No ratings yet
Lecture 2 Prompt Engineering
60 pages
Neural Network Presentation
100% (4)
Neural Network Presentation
33 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
25 pages
Object Detection and Segmentation On Tensor Flow Using
No ratings yet
Object Detection and Segmentation On Tensor Flow Using
10 pages
Lecture 3 Finetuning Part 1
No ratings yet
Lecture 3 Finetuning Part 1
85 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
DEEP_LEARNING_UNIT_1[1]
No ratings yet
DEEP_LEARNING_UNIT_1[1]
24 pages
Tensorflow Examples and Tutorials: Tutorial Index
No ratings yet
Tensorflow Examples and Tutorials: Tutorial Index
6 pages
3D U-Net Based Brain Tumor Segmentation
No ratings yet
3D U-Net Based Brain Tumor Segmentation
11 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Stochastic Search Methods
100% (1)
Stochastic Search Methods
45 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Feature Selection Techniques in Machine Learning - Javatpoint
No ratings yet
Feature Selection Techniques in Machine Learning - Javatpoint
9 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
29 pages
Final Digital Logic Design
No ratings yet
Final Digital Logic Design
137 pages
Motion Detection
No ratings yet
Motion Detection
33 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
100% (1)
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
33 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Resnet Strikes Back: An Improved Training Procedure in Timm
No ratings yet
Resnet Strikes Back: An Improved Training Procedure in Timm
22 pages
chapter 4 Neural Network
No ratings yet
chapter 4 Neural Network
46 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
SQL2
No ratings yet
SQL2
3 pages
IRFS3207Z DataSheet
No ratings yet
IRFS3207Z DataSheet
13 pages
Python Question Bank-1
No ratings yet
Python Question Bank-1
1 page
Guru Gobind Singh Indraprastha University: Practical File of Dbms New Delhi Institute of Management
No ratings yet
Guru Gobind Singh Indraprastha University: Practical File of Dbms New Delhi Institute of Management
50 pages
Medal Log 20240409
No ratings yet
Medal Log 20240409
15 pages
Distinguish Between Systems Programs and Application Programs
0% (1)
Distinguish Between Systems Programs and Application Programs
9 pages
Commerce Act
No ratings yet
Commerce Act
5 pages
Arabian Nights
No ratings yet
Arabian Nights
3 pages
Class 12 Prac 1-1
No ratings yet
Class 12 Prac 1-1
38 pages
TUTORIAL Aldec Active-HDL Simulation
No ratings yet
TUTORIAL Aldec Active-HDL Simulation
26 pages
Precision and Recall
No ratings yet
Precision and Recall
13 pages
Alpine Cde 111rm
No ratings yet
Alpine Cde 111rm
40 pages
Abhishekk Sharma
No ratings yet
Abhishekk Sharma
4 pages
Understandingdeeplearning
100% (1)
Understandingdeeplearning
428 pages
2-FFT-based Power Spectrum Estimation
No ratings yet
2-FFT-based Power Spectrum Estimation
30 pages
SAS DataFlux Management Studio
No ratings yet
SAS DataFlux Management Studio
466 pages
Intelligent Analytics For Industry 4.0 Applications
No ratings yet
Intelligent Analytics For Industry 4.0 Applications
313 pages
ENP Energy Efficient Free Cooling For Data Centers
No ratings yet
ENP Energy Efficient Free Cooling For Data Centers
16 pages
Making Scratch Games FNaF
No ratings yet
Making Scratch Games FNaF
5 pages
BEEEX Practical Manual
No ratings yet
BEEEX Practical Manual
56 pages
Experiment No:5 Date: 7/1/2022 Internet - Web Browsers, Search Engines and Email
No ratings yet
Experiment No:5 Date: 7/1/2022 Internet - Web Browsers, Search Engines and Email
3 pages
Release Control With Powercenter at Ubs / Switzerland
No ratings yet
Release Control With Powercenter at Ubs / Switzerland
28 pages
EG CDR Description
No ratings yet
EG CDR Description
18 pages
SAP Certified Development Specialist - ABAP For SAP HANA - Full
No ratings yet
SAP Certified Development Specialist - ABAP For SAP HANA - Full
17 pages
HP Dv6700 Intel Quanta At3
100% (1)
HP Dv6700 Intel Quanta At3
48 pages
Wipro Ge Healthcare Private Limited PDF
No ratings yet
Wipro Ge Healthcare Private Limited PDF
13 pages
The Digital Dilemma 2: Perspectives From Independent Filmmakers, Documentarians and Nonprofit Audiovisual Archives
No ratings yet
The Digital Dilemma 2: Perspectives From Independent Filmmakers, Documentarians and Nonprofit Audiovisual Archives
136 pages