0% found this document useful (0 votes)
15 views29 pages

Module2.3 Hyperparameter Optimization

Uploaded by

sudothearkknight
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
15 views29 pages

Module2.3 Hyperparameter Optimization

Uploaded by

sudothearkknight
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 29

Course Code: CSA3002

MACHINE LEARNING ALGORITHMS

Course Type: LPC – 2-2-3


Course Objectives
• The objective of the course is to familiarize the learners with
the concepts of Machine Learning Algorithms and attain
Skill Development through Experiential Learning
techniques.
Course Outcomes
At the end of the course, students should be able to
1. Understanding of training and testing the datasets using machine
Learning techniques.
2. Apply optimization and parameter tuning techniques for machine
Learning algorithms.
3. Apply a machine learning model to solve various problems using
machine learning algorithms.
4. Apply machine learning algorithm to create models.
Difference between Parameter and Hyper-
parameter?
• Model Parameters:
• Model parameters are configuration variables that are internal to the model,
and a model learns them on its own. For example, W Weights or Coefficients
of independent variables in the Linear regression model. or Weights or
Coefficients of independent variables in SVM, weight, and biases of a
neural network, cluster centroid in clustering. Some key points for model
parameters are as follows:
• They are used by the model for making predictions.
• They are learned by the model from the data itself
• These are usually not set manually.
• These are the part of the model and key to a machine learning Algorithm.
• Model Hyperparameters:
• Hyperparameters are those parameters that are explicitly defined by
the user to control the learning process. Some key points for model
parameters are as follows:
• These are usually defined manually by the machine learning engineer.
• One cannot know the exact best value for hyperparameters for the
given problem. The best value can be determined either by the rule of
thumb or by trial and error.
• Some examples of Hyperparameters are the learning rate for
training a neural network, K in the KNN algorithm,
Hyper parameter
• Hyperparameters are external configuration settings or parameters that
are set prior to the training process of a machine learning model.
• They are not learned from the data but play a crucial role in
controlling various aspects of the model's behavior and performance.
• These hyperparameters govern the learning process, model
architecture, and optimization procedure, among other aspects, and are
typically determined by the machine learning engineer or researcher.
• In contrast to hyperparameters, the model's parameters are the internal
variables that the algorithm learns from the training data during the
training process.
• These parameters are adjusted iteratively to minimize the model's loss
function and make predictions that fit the data.
• Hyperparameters can include values such as the learning rate, batch
size, number of hidden layers in a neural network, dropout rate,
regularization strength, and many others.
• Properly selecting and tuning hyperparameters is a crucial step in the
machine learning pipeline, as they can significantly impact the model's
performance, convergence speed, and ability to generalize to new,
unseen data.
• Hyperparameter tuning is often done through techniques like grid
search, random search, or more advanced methods like Bayesian
optimization.
Common examples of hyperparameters in
machine learning:
• Learning Rate: In many gradient-based optimization algorithms, like
stochastic gradient descent (SGD), the learning rate is a hyperparameter that
determines the step size at which the model's parameters are updated during
training. It influences how quickly or slowly the model converges to an
optimal solution.
• Number of Hidden Layers and Units: In neural networks, you can define
hyperparameters like the number of hidden layers and the number of units
(neurons) in each layer. These choices affect the model's capacity and
complexity.
• Activation Functions: The choice of activation functions, such as ReLU
(Rectified Linear Unit), Sigmoid, or Tanh, is a hyperparameter that influences
the non-linearity of the model and its ability to capture complex patterns.
• Batch Size: This hyperparameter determines the number of training
examples used in each iteration of training. It can affect the speed and
stability of training.
• Epochs: The number of epochs represents the number of times the entire
training dataset is passed through the model during training. It's a
hyperparameter that controls how many times the model updates its
parameters.
• Regularization Strength: Regularization hyperparameters, like L1 and
L2 regularization coefficients, help control overfitting by adding
penalties to the model's loss function based on the magnitudes of the
model parameters.
• Dropout Rate: Dropout is a regularization technique that randomly drops a fraction
of neurons during each training iteration. The dropout rate is a hyperparameter
controlling the dropout probability.
• Optimization Algorithm: The choice of optimization algorithm, such as SGD
(Stochastic gradient descent), Adam, or RMSprop, is a hyperparameter that
determines how the model's weights are updated during training.
• Loss Function: Different tasks and models require different loss functions (e.g.,
mean squared error for regression or cross-entropy for classification). Choosing the
appropriate loss function is a hyperparameter.
• Learning Rate Schedule: Some models benefit from learning rate schedules that
change the learning rate during training. These schedules can be defined using
hyperparameters like initial learning rate and decay rates.
• Hyperparameter tuning is the process of finding the optimal set of
hyperparameters for a given machine learning task.
• This is often done through techniques like grid search, random search,
or more advanced methods like Bayesian optimization to improve a
model's performance on a specific task.
• Proper hyperparameter tuning can significantly impact a model's
effectiveness and generalization.
Hyperparameter Optimization Techniques
• Hyperparameter optimization is a critical step in the machine learning
workflow as it helps find the best set of hyperparameters for a given
model and task.
Bayesian hyper parameter Optimization
technique
• Bayesian Hyperparameter Optimization is a method for efficiently and
systematically tuning hyperparameters of machine learning models
using Bayesian optimization techniques.
• Bayesian optimization uses probabilistic modeling to shape a prior
distribution of objective functions based on previously evaluated
parameter sets. It leverages these models to suggest new
configurations maximizing expected improvement iteratively and
intelligently in performance metrics.
Simple real time example:
• Imagine you're trying to bake the perfect cake, but you're not sure
about the right amount of sugar to use. You know it could be
anywhere between 1/2 cup and 2 cups. This amount of sugar is like a
"hyperparameter" for your cake.
• Now, instead of randomly guessing how much sugar to use or trying
every possible amount, you decide to be smarter. You start with an
initial guess, say 1 cup of sugar. You bake a cake with that amount and
taste it to see how good it is. This "tasting" part is like checking how
well your machine learning model works with a particular set of
hyperparameters.
Here's where Bayesian optimization
comes in:
• Learning from Experience: After tasting the first cake (trying the first
set of hyperparameters), you make a note of how good or bad it was.
You now have some information about how the amount of sugar
affects the cake's taste.
• Adjusting Your Guess: Instead of randomly guessing the next amount
of sugar, Bayesian optimization uses math to make a smarter guess. It
looks at your previous cake tastings (model performances) and tries
to predict the best amount of sugar to use next. This is like learning
from your past mistakes and successes.
• Repeat and Improve: You bake another cake with the suggested
amount of sugar, taste it, and update your notes. Then, the
optimization process repeats. It keeps suggesting new amounts of
sugar to use based on what it has learned so far.
• Finding the Best Recipe: Over time, Bayesian optimization gets better
at guessing the right amount of sugar. It keeps refining its guess until
it finds the perfect recipe that makes the tastiest cake.
• In the world of machine learning, this process of smartly guessing
hyperparameters and learning from past experiments helps us find
the best settings for our models without trying every possible
combination. It's like finding the perfect recipe for training your
computer programs. This makes it more efficient and saves a lot of
time when you're trying to make your models perform their best.
Theoretical Concept
• 1. Objective Function:
• In Bayesian Hyperparameter Optimization, the objective function is the
performance metric (e.g., accuracy, loss, F1-score) of your machine learning model
on a validation set.
• This function takes hyperparameters as inputs and produces a scalar output that
quantifies the model's performance.
• 2. Probabilistic Surrogate Model:
• The Bayesian optimization process starts by constructing a probabilistic surrogate
model (usually a Gaussian process or Bayesian neural network) that approximates
the objective function.
• Initially, the surrogate model assumes uncertainty about the objective function's
behavior across the hyperparameter space.
• 3. Acquisition Function:
• An acquisition function guides the search for the optimal
hyperparameters by balancing exploration and exploitation.
• Common acquisition functions include Expected Improvement (EI),
Probability of Improvement (PI), and Upper Confidence Bound (UCB).
• 4. Initial Random Sampling:
• A few initial sets of hyperparameters are randomly sampled from the
hyperparameter space to initiate the optimization process.
• These initial evaluations are used to train the surrogate model.
• 5. Iterative Optimization:
• Bayesian Hyperparameter Optimization is an iterative process.
• At each iteration, the acquisition function is optimized to suggest the next set
of hyperparameters to evaluate.
• The selected hyperparameters are used to query the true objective function,
and the corresponding performance metric is obtained.
• 6. Surrogate Model Update:
• After each evaluation, the surrogate model is updated based on the new data
point (hyperparameters and performance metric).
• The surrogate model becomes more accurate in regions of the hyperparameter
space where data points have been observed.
• 7. Acquisition Function Optimization:
• The acquisition function is optimized to determine the next set of
hyperparameters to evaluate in the following iteration.
• It usually seeks to maximize the expected improvement in the
objective function over the current best observed value.
• 8. Stopping Criterion:
• Bayesian Hyperparameter Optimization continues iterating until a
stopping criterion is met, such as reaching a maximum number of
iterations or finding hyperparameters with satisfactory performance.
• The key idea behind Bayesian Hyperparameter Optimization is that it
uses the surrogate model's predictions to intelligently select
hyperparameters for evaluation, focusing on promising regions of the
hyperparameter space.
• This strategy efficiently explores the hyperparameter space while
minimizing the number of costly objective function evaluations.
• Overall, Bayesian Hyperparameter Optimization is a powerful technique
for automating the process of finding the best hyperparameters for
machine learning models, making it especially valuable when the search
space is large or when hyperparameter tuning can be computationally
expensive.
Implementation in Python
• Implementing Bayesian hyperparameter optimization from scratch
can be quite complex. Fortunately, there are libraries available that
make it much easier to perform Bayesian optimization for
hyperparameter tuning. One such popular library is Optuna. Here's
how you can use Optuna to perform Bayesian hyperparameter
optimization in Python:
• First, you'll need to install the Optuna library if you haven't already:

• pip install optuna pandas scikit-learn


import optuna
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the diabetes dataset (you may need to adjust the path to your dataset).
diabetes_data = pd.read_csv('diabetes.csv')

# Split the data into features (X) and target (y).


X = diabetes_data.drop(columns=['Outcome'])
y = diabetes_data['Outcome']
# Define an objective function for hyperparameter optimization.
def objective(trial):
# Define the hyperparameters to optimize.
max_depth = trial.suggest_int('max_depth', 2, 32)
min_samples_split = trial.suggest_uniform('min_samples_split', 0.1, 1.0)
min_samples_leaf = trial.suggest_uniform('min_samples_leaf', 0.1, 0.5)

# Create and train a decision tree classifier with the suggested hyperparameters.
model = DecisionTreeClassifier(
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
random_state=42 # for reproducibility
)
# Split the data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model.


model.fit(X_train, y_train)

# Make predictions on the test set.


y_pred = model.predict(X_test)

# Calculate accuracy as the objective to be minimized.


accuracy = accuracy_score(y_test, y_pred)

return accuracy
if __name__ == "__main__":
# Create a study object to manage the optimization process.
study = optuna.create_study(direction='maximize') # We want to maximize accuracy.

# Optimize the objective function.


study.optimize(objective, n_trials=100) # You can adjust the number of trials.

# Print the best hyperparameters and their corresponding accuracy.


best_params = study.best_params
best_accuracy = study.best_value
print("Best Hyperparameters: ", best_params)
print("Best Accuracy: ", best_accuracy)
• In this code:
• We load the diabetes.csv dataset, assuming it contains features (X) and the target
(Outcome).
• The objective function defines hyperparameters (max_depth, min_samples_split, and
min_samples_leaf) for the Decision Tree classifier that we want to optimize using
Optuna.
• Inside the objective function, we create and train a Decision Tree classifier with the
suggested hyperparameters. We then calculate the accuracy of the model on a test
set.
• The study object is created to manage the optimization process, and we call
study.optimize to perform the optimization.
• Finally, we print the best hyperparameters and their corresponding accuracy score.

You might also like