0% found this document useful (0 votes)
9 views28 pages

Machine Learning Note - Exam Note For ML

Uploaded by

Suman Shaw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
9 views28 pages

Machine Learning Note - Exam Note For ML

Uploaded by

Suman Shaw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 28

Machine Learning

What is Machine Learning?

A subset of artificial intelligence known as machine learning focuses primarily on the creation of algorithms
that enable a computer to independently learn from data and previous experiences. Arthur Samuel first used
the term "machine learning" in 1959.
Machine Learning is the field of study that gives computers the capability to learn without being explicitly
programmed.
Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns
within datasets, allowing them to make predictions on new, similar data without explicit programming for
each task.

How does Machine Learning Work?


A machine learning system builds prediction models, learns from previous data, and predicts the output of
new data whenever it receives it. The amount of data helps to build a better model that accurately predicts
the output, which in turn affects the accuracy of the predicted output.
Let's say we have a complex problem in which we need to make predictions. Instead of writing code, we just
need to feed the data to generic algorithms, which build the logic based on the data and predict the output.
Features of Machine Learning:
• Machine learning uses data to detect various patterns in a given dataset.
• It can learn from past data and improve automatically.
• It is a data-driven technology.
• Machine learning is much similar to data mining as it also deals with the huge amount of the data.

Applications of Machine learning:


1. Image Recognition: Image recognition is one of the most common applications of machine learning. It
is used to identify objects, persons, places, digital images, etc. The popular use case of image
recognition and face detection is Automatic friend tagging suggestion.
2. Speech Recognition: Speech recognition is a process of converting voice instructions into text, and it is
also known as "Speech to text", or "Computer speech recognition." At present, machine learning
algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the voice
instructions.
3. Self-driving cars: Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised learning method to
train the car models to detect people and objects while driving.
4. Email Spam and Malware Filtering: Whenever we receive a new email, it is filtered automatically as
important, normal, and spam. We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this is Machine learning.
5. Virtual Personal Assistant: We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. These assistants can help us in various ways just by our voice
instructions such as Play music, call someone, Open an email, Scheduling an appointment, etc.
6. Stock Market trading: Machine learning is widely used in stock market trading. In the stock market,
there is always a risk of up and downs in shares, so for this machine learning's long short term
memory neural network is used for the prediction of stock market trends.

Machine Learning Traditional Programming


1. Machine Learning is a subset of artificial 1. In traditional programming, rule-based code is
intelligence(AI) that focus on learning from data to written by the developers depending on the problem
develop an algorithm that can be used to make a statements.
prediction.
2. Machine Learning uses a data-driven approach, It 2. Traditional programming is typically rule-based and
is typically trained on historical data and then used deterministic. It hasn’t self-learning features like
to make predictions on new data. Machine Learning and AI.
3. ML can find patterns and insights in large 3. Traditional programming is totally dependent on the
datasets that might be difficult for humans to intelligence of developers. So, it has very limited
discover. capability.
4. Machine Learning is the subset of AI. And Now it 4. Traditional programming is often used to build
is used in various AI-based tasks like Chatbot applications and software systems that have specific
Question answering, self-driven car., etc. functionality.
5. Performance improves with more data and better 5. Performance depends on the efficiency of the written
models. code.
6. Learns to handle errors through training. 6. Explicit error handling through code and exceptions
Machine Learning Deep Learning
1. Machine Learning is a superset of Deep Learning. 1. Deep Learning is a subset of Machine Learning.
2. Machine learning algorithm takes less time to 2. Deep Learning takes a long execution time to train
train the model than deep learning, but it takes a the model, but less time to test the model.
long-time duration to test the model.
3. Although machine learning depends on the huge 3. Deep Learning algorithms highly depend on a large
amount of data, it can work with a smaller amount amount of data, so we need to feed a large amount of
of data. data for good performance.
4. Machine learning models mostly require data in a 4. Deep Learning models can work with structured and
structured form. unstructured data both as they rely on the layers of the
Artificial neural network.
5. Machine learning models are suitable for solving 5. Deep learning models are suitable for solving
simple or bit-complex problems. complex problems.
6. Training can be performed using standard CPUs. 6. Often requires dedicated hardware (GPUs, TPUs) for
training.

Q. Describe the steps involved in training a machine learning model


A machine learning model is a mathematical representation or algorithm that learns patterns from the given
training data and generalizes these predictions accurately on new, unseen data. Training a machine learning
model involves several systematic steps to ensure that the model learns effectively from the data and
generalizes well to unseen data. Here’s a detailed description of the steps involved:
1. Data collection: The initial step in training a machine learning model is to identify the problem
statement and gather the necessary requirements. Then, according to these requirements, relevant
data is collected. It is necessary that the data matches the requirements as accurately as possible as
the ML model's only input is the data we feed into it, and on whose basis, predictions and
classifications are conducted.
2. Data preprocessing: The second stage in training an ML model is the data processing steps. This is a
crucial stage that involves transforming raw data into a format that is suitable for model enhancing
the performance of the ML algorithms. The steps involved in data preprocessing are -
• Data completion: It refers to the process of handling missing values in a dataset which can
occur due to malfunction of sensors, or incomplete data collection.
• Data transformation: It involves changing the scale, distribution, or format of the data to meet
specific requirements of the model.
• Data noise reduction: It is the process of removing unwanted random variations or errors,
known as noise, from a dataset that may arise from measurement errors or data collection
inconsistencies.
3. Feature extraction: After completing the data processing steps, we conduct feature extraction, which
is a technique used to reduce a large input data set into relevant features. This is done with
dimensionality reduction to transform large input data into smaller, meaningful groups for processing.
Some common techniques include the principle components analysis (PCA) and the independent
component analysis (ICA).
4. Training the model: Training a machine learning model includes choosing the right model suited to the
requirements according to the tasks that we want to perform, such as regression, classification or
clustering, etc.
Once the model is selected, it is trained on the training dataset. The goal is to adjust the model's
parameters (weights and biases) so that it can learn the underlying patterns and relationships in the
data. This is typically done through an optimization process that minimizes a predefined loss function.
5. Evaluating the model: After training the model, the next step is to evaluate the trained model on new
unseen data to determine its ability to make accurate predictions. The test data is usually used to
predict the accuracy of the model. The model's predicted value and the actual value of the test set are
used in the loss function to measure the accuracy of the model. Common evaluation metrics include
accuracy, precision, recall, F1-score, mean squared error, and others, depending on the problem and
the type of model.
6. Hyperparameter tuning: Once the model is evaluated, we see if its accuracy can be improved in any
way. This is done by tuning the parameters present in your model. Parameters are the internal
variables or weights that a machine learning model learns during the training
process. Hyperparameter tuning refers to finding the particular value at which the accuracy of our
model will be the maximum.
7. Making predictions: In the end, our model is ready to make predictions on new, unseen data,
accurately.

Q. What are ensemble methods in machine learning? Provide examples of popular


ensemble techniques.
Ensemble methods in machine learning are techniques that combine multiple individual models (often
referred to as "weak learners" or "base models") to create a single, stronger predictive model. The primary
goal of ensemble methods is to improve the overall performance, stability, accuracy, and robustness of the
prediction compared to what any single model could achieve on its own. Ensemble methods work on the
principle that a group of weak learners can come together to form a strong learner.
Popular Ensemble Techniques:
1. Bagging (Bootstrap Aggregating):
Bagging, also known as bootstrap aggregation is an ensemble learning technique that combines the
benefits of bootstrapping and aggregation to yield a stable model and improve the prediction
performance of a machine-learning model.
Bagging involves training multiple models on different subsets of the training data and then
aggregating their predictions. These subsets are created by random sampling with replacement
(bootstrap sampling).
Example:
Random Forest: An ensemble of decision trees, where each tree is trained on a different bootstrap
sample of the data. The final prediction is made by averaging the predictions (for regression) or taking
the majority vote (for classification) of the individual trees.
So, there are three steps:
1. Sample equal-sized subsets with replacement
2. Train weak models on each of the subsets independently and in parallel
3. Combine the results from each of the weak models by averaging or voting to get a final result

The main idea behind bagging is to reduce the variance in a dataset, ensuring that the model is robust
and not influenced by specific samples in the dataset.
For this reason, bagging is mainly applied to tree-based machine learning models such as decision
trees and random forests.
Advantages:
a) Reduces overall variance
b) Increases models’ robustness to noise in the data
Disadvantages:
a) High number of weak models may reduce model interpretability

2. Boosting: Boosting sequentially trains models, each trying to correct the errors of the previous models.
Each subsequent model gives more weight to the data points that were misclassified or had higher
errors by the previous models.
The main idea behind sequential training is to have each model correct the errors of its predecessor.
This continues until the predefined number of trained models or some other criteria are met.

We first initialize data weights to the same value and then perform the following steps iteratively:
1. Train a model on all instances
2. Calculate the error on model output over all instances
3. Assign a weight to the model (high for good performance and vice-versa)
4. Update data weights: give higher weights to samples with high errors
5. Repeat the previous steps if the performance isn’t satisfactory or other stopping conditions are
met
Finally, we combine the models into the one we use for prediction.
Boosting generally improves the accuracy of a machine learning model by improving the performance
of weak learners. We typically use XGBoost, CatBoost, and AdaBoost.
Advantages:
a) Improves overall accuracy
b) Reduces overall bias by improving on the weakness of the previous model
Disadvantages:
a) Can be computationally expensive
b) Sensitive to noisy data
c) Model dependency may allow for replication of errors
3. Stacking (Stacked Generalization): Stacking involves training multiple base models and then training a
meta-model (also called a second-level model) to combine their predictions. The base models'
predictions are used as inputs to the meta-model. The base and meta-models don’t have to be of the
same type. For example, we can pair a decision tree with a support vector machine (SVM).
Here are the steps:
• Construct base models on different portions of the training data
• Train a meta-model on the predictions from the base models
Example: A common approach is to use different types of models (e.g., logistic regression, decision
trees, SVMs) as base models and a linear model or a more complex model as the meta-model.

Advantages:
a) Combines the benefits of different models into one
b) Increases overall accuracy
Disadvantages:
a) May take a longer time to train and aggregate the predictions of different types of models
b) Training several base models and a meta-model increases complexity

4. Voting: This is a simple ensemble technique where multiple models are trained, and their predictions
are combined by taking a majority vote (for classification problems) or averaging (for regression
problems). Different models can be weighted differently based on their individual performance.
Examples:
• Hard Voting: For classification, takes the majority class predicted by the ensemble.
• Soft Voting: For classification, averages the predicted probabilities and selects the class with
the highest average probability.
5. Blending: Blending is similar to stacking, but instead of using a meta-model to combine the base
models' predictions, it uses a simpler technique like averaging or a weighted average based on the
performance of each base model on a holdout set.

Q. Discuss the impact of feature selection on the performance of machine learning models.
How can different feature selection techniques affect model accuracy and efficiency?
Feature selection plays a crucial role in the performance of machine learning models, as it can significantly
impact model accuracy, efficiency, and interpretability. Choosing the right set of features is essential because
irrelevant or redundant features can introduce noise and complexity, leading to overfitting, increased
computational cost, and poor generalization to new data.

Impact on Model Performance:


• Improved Generalization: By removing irrelevant or redundant features, feature selection can reduce
overfitting and improve the model's ability to generalize to new, unseen data.
• Noise Reduction: Excluding noisy features that do not contribute to the predictive power of the model
can lead to cleaner data and better model performance.
• Reduced Training Time: With fewer features, the computational complexity of training the model
decreases, leading to faster training times.
• Simplicity: A simpler model with fewer features is often easier to interpret and understand, which can
lead to better decision-making and trust in the model's predictions.
• Improved Scalability: Models with fewer features require less memory and computational resources,
making them more scalable to larger datasets.
• Easier Maintenance: Simplified models are easier to maintain and update, especially when new data is
introduced or when the model needs to be retrained.

Different feature selection techniques can have varying effects on model performance:
1. Filter Methods: Filter methods are computationally efficient and provide a good initial set of features.
However, they do not consider interactions between features, which may limit their effectiveness in
some cases.
2. Wrapper Methods: These methods often yield better performance because they consider feature
interactions and model performance. However, they are computationally expensive, especially with
large datasets and feature sets.
3. Embedded Methods: Embedded methods balance efficiency and performance by integrating feature
selection into the learning algorithm. They are particularly useful for high-dimensional datasets.
4. Dimensionality Reduction Techniques: These techniques can effectively reduce the feature space while
retaining most of the relevant information. However, they can make the resulting features less
interpretable.

Q. Discuss Major issues in Machine Learning Approach


Machine learning (ML) offers powerful tools for data analysis and predictive modeling, but it also comes with
a range of challenges and issues. These issues can affect the accuracy, efficiency, and fairness of ML models.
Here are some major issues in the machine learning approach:
1. Data Quality and Quantity:
Data Quality: Poor data quality, including noise, missing values, and errors, can lead to inaccurate
models.
Data Quantity: Insufficient data can result in overfitting, where the model performs well on training
data but poorly on unseen data.
2. Overfitting and Underfitting:
Overfitting: When a model learns the training data too well, including its noise and outliers, it
performs poorly on new data. This is often due to high model complexity.
Underfitting: When a model is too simple to capture the underlying structure of the data, it performs
poorly on both training and test data.
3. Feature Selection: Identifying the most relevant features is crucial for model accuracy. Irrelevant or
redundant features can reduce model performance.
4. Complex Models: Techniques like deep learning and ensemble methods often produce highly
accurate models that are difficult to interpret, making it hard to understand how decisions are made.
5. Computational Resources: Training complex models, especially deep learning models, requires
significant computational resources, including powerful GPUs and large amounts of memory.
6. Data Security: Ensuring the security of data used for training and the integrity of models is critical to
prevent unauthorized access and tampering.
7. Maintenance: Models need continuous monitoring and updating to maintain performance as new
data becomes available or as the underlying data distribution changes (data drift).
Q. If you are collecting a raw data from an authentic source, what are the measures you will
take to preprocess it.
When collecting raw data from an authentic source, preprocessing the data is a crucial step to ensure its
quality, consistency, and usability for machine learning models or other analytical tasks. Here are some
common measures I would take to preprocess the raw data:
1. Data Cleaning:
• Handle missing values: Identify and address missing data points by either removing the
corresponding rows or columns, or using techniques like imputation (e.g., mean, median, or
mode imputation) to fill in the missing values.
• Remove duplicates: Identify and remove any duplicate rows or instances from the dataset to
avoid redundancy and inconsistencies.
2. Data Transformation:
• Handle categorical data: Convert categorical variables into a numerical format that can be
understood by machine learning algorithms, typically using techniques like one-hot encoding
or label encoding.
• Normalize numerical data: Apply normalization techniques like min-max scaling or z-score
normalization to numerical features to ensure they are on a similar scale, which can improve
the performance of certain algorithms.
3. Data Integration:
• Merge or join data sources: If the data comes from multiple sources, merge or join them into
a single dataset based on common attributes or keys, ensuring consistency and avoiding
duplication.
4. Data Sampling and Splitting:
• Dimensionality Reduction: Use techniques like Principal Component Analysis (PCA) to reduce
the number of features while retaining essential information.
• Sampling: If the dataset is too large, consider random sampling to reduce the size while
maintaining representativeness.
5. Data Validation:
• Validate the preprocessed data to ensure that the transformations and operations performed
have not introduced any inconsistencies or errors.
• Perform quality checks on the preprocessed data, such as checking for missing values,
outliers, or inconsistent data types, to ensure that the data meets the desired quality
standards.
Types of Machine Learning:
There are several types of machine learning, each with special characteristics and applications. Some of the
main types of machine learning algorithms are as follows:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

❖ Supervised Machine Learning:


• Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output.
• In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.
• Supervised learning is a process of providing input data as well as correct output data to the
machine learning model.
• The aim of a supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y). Y = f(X)

Working of Supervised Learning:


• In supervised learning, models are trained using labelled dataset, where the model learns about each
type of data.
• Once the training process is completed, the model is tested on the basis of test data (a subset of the
training set), and then it predicts the output.

Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and
Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify the shape.
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:
1. Regression: Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather forecasting,
Market Trends, etc.
Some popular Regression algorithms which come under supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
2. Classification: Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.
Example: Spam email filtering, Rainfall tomorrow or not etc
Some popular Classification algorithms which come under supervised learning:
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines

Advantages of Supervised Machine Learning


• Supervised Learning models can have high accuracy as they are trained on labelled data.
• The process of decision-making in supervised learning models is often interpretable.
• It can often be used in pre-trained models which saves time and resources when developing new models
from scratch.
Disadvantages of Supervised Machine Learning
• It has limitations in knowing patterns and may struggle with unseen or unexpected patterns that are not
present in the training data.
• It can be time-consuming and costly as it relies on labeled data only.
• It may lead to poor generalizations based on new data.

Applications of Supervised Learning


Supervised learning is used in a wide variety of applications, including:
• Image classification: Identify objects, faces, and other features in images.
• Natural language processing: Extract information from text, such as sentiment, entities, and
relationships.
• Speech recognition: Convert spoken language into text.
• Recommendation systems: Make personalized recommendations to users.
• Predictive analytics: Predict outcomes, such as sales, customer churn, and stock prices.
• Medical diagnosis: Detect diseases and other medical conditions.
• Fraud detection: Identify fraudulent transactions.
• Autonomous vehicles: Recognize and respond to objects in the environment.
• Email spam detection: Classify emails as spam or not spam.
• Weather forecasting: Make predictions for temperature, precipitation, and other meteorological
parameters.
Classification Regression
1. In Classification, the target variables are 1. In Regression, the target variables are
discrete. continuous.
2. In Classification, we try to find the decision 2. In Regression, we try to find the best fit line,
boundary, which can divide the dataset into which can predict the output more accurately.
different classes.
3. Classification Algorithms can be used to solve 3. Regression algorithms can be used to solve the
classification problems such as spam emails regression problems such as Weather Prediction,
classification, Speech Recognition, House price prediction, etc.
Identification of cancer cells, etc.
4. Evaluation metrics like Precision, Recall, and 4. Evaluation metrics like Mean Squared Error, R2-
F1-Score are used here to evaluate the Score, and MAPE are used here to evaluate the
performance of the classification algorithms. performance of the regression algorithms.
5. Here we face the problems like binary 5. Here we face the problems like Linear
Classification or Multi Class Classification Regression models as well as non-linear models.
problems.
6. Output is Categorical labels. 6. Output is Continuous numerical values.

❖ Unsupervised Machine Learning:


• Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
• The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
• The primary goal of Unsupervised learning is often to discover hidden patterns, similarities, or
clusters within the data, which can then be used for various purposes, such as data exploration,
visualization, dimensionality reduction, and more.

Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of
different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it does
not have any idea about the features of the dataset. The task of the unsupervised learning algorithm is to
identify the image features on their own. Unsupervised learning algorithm will perform this task by clustering
the image dataset into the groups according to similarities between images.
Why use Unsupervised Learning?
Some main reasons which describe the importance of Unsupervised Learning:
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own experiences, which makes
it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning
more important.
o In real-world, we do not always have input data with the corresponding output so to solve such cases,
we need unsupervised learning.

Types of Unsupervised Learning Algorithm:


1. Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group.
Cluster analysis finds the commonalities between the data objects and categorizes them as per the
presence and absence of those commonalities.
2. Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy more effective. Such as people who
buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:


Below is the list of some popular unsupervised learning algorithms:
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Advantages of Unsupervised Machine Learning


• It helps to discover hidden patterns and various relationships between the data.
• Used for tasks such as customer segmentation, anomaly detection, and data exploration.
• It does not require labeled data and reduces the effort of data labeling.
Disadvantages of Unsupervised Machine Learning
• Without using labels, it may be difficult to predict the quality of the model’s output.
• Unsupervised learning algorithms can be sensitive to the quality of the input data. Noisy or incomplete
data can lead to misleading or inaccurate results.
• Some unsupervised learning algorithms, particularly those dealing with high-dimensional data or large
datasets, can be computationally expensive.
• It can be difficult to understand the decision-making process of unsupervised learning models.
Applications of Unsupervised Learning
Here are some common applications of unsupervised learning:
• Clustering: Group similar data points into clusters.
• Anomaly detection: Identify outliers or anomalies in data.
• Dimensionality reduction: Reduce the dimensionality of data while preserving its essential information.
• Recommendation systems: Suggest products, movies, or content to users based on their historical
behavior or preferences.
• Topic modeling: Discover latent topics within a collection of documents.
• Density estimation: Estimate the probability density function of data.
• Image and video compression: Reduce the amount of storage required for multimedia content.
• Natural language processing (NLP): Unsupervised learning is used in a variety of NLP tasks, including
topic modeling, document clustering, and part-of-speech tagging.

Supervised Learning Unsupervised Learning


1. Supervised learning algorithms are trained 1. Unsupervised learning algorithms are trained
using labeled data. using unlabeled data.
2. Supervised learning model predicts the 2. Unsupervised learning model finds the hidden
output. patterns in data.
3. In supervised learning, input data is provided 3. In unsupervised learning, only input data is
to the model along with the output. provided to the model.
4. Supervised learning model takes direct 4. Unsupervised learning model does not take any
feedback to check if it is predicting correct feedback.
output or not.
5. Supervised learning needs supervision to 5. Unsupervised learning does not need any
train the model. supervision to train the model.
6. In supervised learning it is not possible to 6. In unsupervised learning it is possible to learn
learn larger and more complex models than in larger and more complex models than in
unsupervised learning supervised learning
7. Supervised learning model produces an 7. Unsupervised learning model may give less
accurate result. accurate result.

Examples: (Supervised)
• Email Spam Detection: Using historical email data labeled as 'spam' or 'not spam' to train a model to
classify new emails.
• Image Recognition: Training a model with labeled images to recognize objects (e.g., cats vs. dogs).
• House Price Prediction: Using historical data on house prices and features (e.g., size, location) to
predict the price of new houses.
Examples: (Unsupervised)
• Document Clustering: Organizing a collection of documents into clusters based on content similarity.
• Principal Component Analysis (PCA): Reducing the number of features in a dataset while retaining
most of the variance, often used for visualization and noise reduction.
• t-Distributed Stochastic Neighbor Embedding (t-SNE): Visualizing high-dimensional data by reducing
it to two or three dimensions.
Semi-supervised Learning Reinforcement Learning
1. It is a combination of supervised and 1. A different learning approach based on an
unsupervised learning agent interacting with an environment
2. Semi-Supervised Learning uses a small 2. Reinforcement Learning Uses a system of
amount of labeled data along with a large rewards and penalties to learn.
amount of unlabeled data.
3. It Utilizes labeled data to guide learning, 3. It is not dependent on labeled data, relies on
unlabeled data to enhance accuracy. reward signals from the environment
4. Its goal is to improve learning accuracy with 4. Its goal is to learn a policy that maximizes a
limited labeled data. cumulative reward signal.
5. First, a model is trained on the labeled data. 5. The agent takes actions in an environment,
Then, the model is further refined using the receives rewards or penalties, and updates its
unlabeled data. policy based on the feedback to maximize the
expected cumulative reward.
6. Useful when labeling data is expensive or 6. Suitable for sequential decision-making
time-consuming, such as in image problems, such as game playing, robotics, and
classification, natural language processing, and autonomous systems.
speech recognition.
❖ Regression:
Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables. More specifically, Regression
analysis helps us to understand how the value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed. It predicts continuous/real values such
as temperature, age, salary, price, etc.
Regression is a supervised learning technique which helps in finding the correlation between variables and
enables us to predict the continuous output variable based on the one or more predictor variables. It is mainly
used for prediction, forecasting, time series modeling, and determining the causal-effect relationship
between variables.
Regression shows a line or curve that passes through all the datapoints on target-predictor graph in such a
way that the vertical distance between the datapoints and the regression line is minimum. The distance
between datapoints and line tells whether a model has captured a strong relationship or not.

Types of Regression:
1. Linear Regression:
• Linear regression is a statistical regression method which is used for predictive analysis.
• It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable (X-axis) and
the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
• The relationship between variables in the linear regression model can be explained using the
below image. Here we are predicting the salary of an employee on the basis of the year of
experience.

The mathematical equation for Linear regression: Y = aX+b


Here, Y = dependent variables (target variables), X= Independent variables (predictor variables),
a and b are the linear coefficients
2. Logistic Regression:
• Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a binary or
discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True
or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear regression algorithm
in the term how they are used.
• Logistic regression uses sigmoid function or logistic function which is a complex cost function.
This sigmoid function is used to model the data in logistic regression. The function can be
represented as:
1
𝑓(𝑥) =
1 + 𝑒 −𝑥
Where, f(x)= Output between the 0 and 1 value, x= input to the function, e= base of natural
logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values
below the threshold level are rounded up to 0.

3. Polynomial Regression:
• Polynomial Regression is a type of regression which models the non-linear dataset using a
linear model.
• It is similar to multiple linear regression, but it fits a non-linear curve between the value of x
and corresponding conditional values of y.
• Suppose there is a dataset which consists of datapoints which are present in a non-linear
fashion, so for such case, linear regression will not best fit to those datapoints. To cover such
datapoints, we need Polynomial regression.
• In Polynomial regression, the original features are transformed into polynomial features of
given degree and then modeled using a linear model. Which means the datapoints are best
fitted using a polynomial line.
• The equation for polynomial regression also derived from linear regression equation that
means Linear regression equation Y= b0+ b1x, is transformed into Polynomial regression
equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
• Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is
our independent/input variable.
• The model is still linear as the coefficients are still linear with quadratic

Linear Regression Logistic Regression


1. Linear regression is used to predict the 1. Logistic Regression is used to predict the
continuous dependent variable using a given categorical dependent variable using a given set of
set of independent variables. independent variables.
2. Linear Regression is used for solving 2. Logistic regression is used for solving
Regression problem. Classification problems.
3. In linear regression, we find the best fit line, 3. In Logistic Regression, we find the S-curve by
by which we can easily predict the output. which we can classify the samples.
4. Least square estimation method is used for 4. Maximum likelihood estimation method is used
estimation of accuracy. for estimation of accuracy.
5. In Linear Regression, we predict the value by 5. In Logistic Regression, we predict the value by 1
an integer number. or 0.
6. Here no threshold value is needed. 6. Here a threshold value is added.
❖ KNN Algorithm:
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by using
K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
• It is also called a lazy learner algorithm because it does not learn from the training set immediately
instead it stores the dataset and at the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to
know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of the new data set to the cats and dogs
images and based on the most similar features it will put it in either cat or dog category.

Why do we need a K-NN Algorithm?


Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in each category.
• Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
• Step-6: Our model is ready.
How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the value of K in the K-NN algorithm:
• There is no particular way to determine the best value for "K", so we need to try some values to find
the best out of them. The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the
model.
• Large values for K are good, but it may find some difficulties.

Applications:
• Data preprocessing: The KNN algorithm is used for the process of missing data imputation that
estimates the missing values
• Pattern recognition: The KNN algorithm is useful in identifying patterns in customer purchase
behavior.
• Stock price prediction: The KNN algorithm is useful in predicting the future value of stocks based on
historical data.
• Recommendation system: The KNN algorithm can be used in an online video streaming platform to
suggest content a user is more likely to watch by analyzing what similar users watch.
Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data points for all the
training samples.

❖ Decision Tree Algorithm:


• Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes
are used to make any decision and have multiple branches, whereas Leaf nodes are the output of
those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for Classification and Regression
Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset and
problem is the main point to remember while creating a machine learning model. Below are the two reasons
for using the Decision tree:
• Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
• The logic behind the decision tree can be easily understood because it shows a tree-like structure.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept the
offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM).
The root node splits further into the next decision node (distance from the office) and one leaf node based on
the corresponding labels. The next decision node further gets split into one decision node (Cab facility) and
one leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:

Advantages of the Decision Tree


• It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
• For more class labels, the computational complexity of the decision tree may increase.
❖ Naïve Bayes Classification Algorithm:
• Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the bases of
color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without depending on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability of
a hypothesis with prior knowledge. It depends on the conditional probability.
• The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: the probability of event A occurring, given event B has occurred.
P(B|A) is Likelihood probability: the probability of event B occurring, given event A has occurred.
P(A) is Prior Probability: the probability of event A.
P(B) is Marginal Probability: the probability of event B.

Advantages of Naïve Bayes Classifier:


o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features.
❖ PCA:
• Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning.
• It is a statistical process that converts the observations of correlated features into a set of
linearly uncorrelated features with the help of orthogonal transformation.
• These new transformed features are called the Principal Components.
• It is a technique to draw strong patterns from the given dataset by reducing the variances.
• PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.
• PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the
power allocation in various communication channels.

The PCA algorithm is based on some mathematical concepts such as:


o Variance and Covariance
o Eigenvalues and Eigen factors

Principal Components in PCA:


As described above, the transformed new features or the output of PCA are the Principal
Components. The number of these PCs are either equal to or less than the original features present in
the dataset. Some properties of these principal components are given below:
o The principal component must be the linear combination of the original features.
o These components are orthogonal, i.e., the correlation between a pair of variables is zero.
o The importance of each component decreases when going to 1 to n, it means the 1 PC has
the most importance, and n PC will have the least importance.
Benefits of PCA
1. Noise Reduction: By focusing on the principal components that capture the most variance,
PCA can help reduce noise in the data.
2. Computational Efficiency: Reduces the number of features, leading to faster training and
inference times for machine learning models.
3. Visualization: Helps in visualizing high-dimensional data by reducing it to 2 or 3 dimensions.
4. Mitigating Multicollinearity: Transforms correlated features into a set of uncorrelated
principal components.
Drawbacks of PCA
1. Interpretability: The principal components are linear combinations of the original features,
which can make them hard to interpret.
2. Loss of Information: While PCA aims to retain as much variance as possible, some
information is inevitably lost, especially if the selected number of components is small.
3. Linearity Assumption: PCA assumes linear relationships among features, which might not
capture complex non-linear relationships in the data.
Applications of Principal Component Analysis:
o PCA is mainly used as the dimensionality reduction technique in various AI applications
such as computer vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high dimensions. Some fields where
PCA is used are Finance, data mining, Psychology, etc.
Q. How dimensions are reduced using PCA?
Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used in machine
learning and data analysis to reduce the number of variables (or dimensions) in a dataset while preserving the
essential information. Here's how PCA achieves dimensionality reduction:
Steps to Reduce Dimensions Using PCA:
1. Standardization of Data: PCA requires that the data is standardized (mean-centered and scaled)
because it is sensitive to the variances of the different features.
2. Compute the Covariance Matrix: PCA computes the covariance matrix of the standardized data. The
covariance matrix gives an idea of how much two variables change together.
3. Eigenvalue Decomposition: PCA performs eigenvalue decomposition (or Singular Value Decomposition
(SVD)) on the covariance matrix to obtain the eigenvectors and eigenvalues. The eigenvectors
represent the principal components (new axes) of the data, and the eigenvalues represent the amount
of variance explained by each principal component.
4. Select Principal Components: PCA sorts the eigenvalues in descending order. The principal components
associated with the largest eigenvalues (highest variance) capture the most information about the
data variability. Typically, the number of principal components chosen is less than or equal to the
original number of variables (dimensions).
5. Projection of Data: PCA projects the original data onto the new subspace spanned by the selected
principal components. This transformation creates new variables (principal components) that are
linear combinations of the original variables.

❖ SVM:
• Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional
space that can separate the data points in different classes in the feature space.
• The hyperplane tries that the margin between the closest points of different classes should be as
maximum as possible.
• The dimension of the hyperplane depends upon the number of features. If the number of input
features is two, then the hyperplane is just a line. If the number of input features is three, then
the hyperplane becomes a 2-D plane.
Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it
is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with
lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test
it with this strange creature. So as support vector creates a decision boundary between these two data (cat
and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis
of the support vectors, it will classify it as a cat.

Types of SVM:
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.

Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of different
classes in a feature space.
Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a critical role
in deciding the hyperplane and margin.

Advantages of SVM
• Effective in High Dimensions: Works well with high-dimensional data.
• Memory Efficient: Uses a subset of training points (support vectors) in the decision function.
• Versatile: Different kernel functions can be specified for the decision function, making SVM adaptable to
different data structures.

Disadvantages of SVM
• Computationally Intensive: Training can be slow, especially with large datasets.
• Parameter Tuning: Performance depends on the choice of kernel and parameters (like C and ϒ).
• Not Probabilistic: Does not provide direct probability estimates but can be calibrated using methods like
Platt scaling.
❖ Cross Validation:
• Cross validation is a technique used in machine learning to evaluate the performance of a model
on unseen data. It involves dividing the available data into multiple folds or subsets, using one of
these folds as a validation set, and training the model on the remaining folds. This process is
repeated multiple times, each time using a different fold as the validation set. Finally, the results
from each validation step are averaged to produce a more robust estimate of the model’s
performance.
• The primary purpose of cross-validation is to ensure that the model performs well on unseen
data, thereby avoiding overfitting and underfitting.
The basic steps of cross-validations are:
• Reserve a subset of the dataset as a validation set.
• Provide the training to the model using the training dataset.
• Now, evaluate model performance using the validation set. If the model performs well with the
validation set, perform the further step, else check for the issues.

Types of Cross-Validation
1. Holdout Validation: In Holdout Validation, we perform training on the 50% of the given dataset
and rest 50% is used for the testing purpose. It’s a simple and quick way to evaluate a model. The
major drawback of this method is that we perform training on the 50% of the dataset, it may
possible that the remaining 50% of the data contains some important information which we are
leaving while training our model i.e. higher bias.
2. LOOCV (Leave One Out Cross Validation): In this method, we perform training on the whole
dataset but leaves only one data-point of the available dataset and then iterates for each data-
point. In LOOCV, the model is trained on (n – 1) samples and tested on the one omitted sample,
repeating this process for each data point in the dataset.
3. K-Fold Cross Validation: In K-Fold Cross Validation, we split the dataset into k number of subsets
(known as folds) then we perform training on the all the subsets but leave one(k-1) subset for the
evaluation of the trained model. In this method, we iterate k times with a different subset
reserved for testing purpose each time.
4. Stratified K-Fold Cross-Validation: Similar to K-Fold but ensures that each fold has
approximately the same percentage of samples of each target class. This is especially useful for
imbalanced datasets.
Advantages:
1. Overcoming Overfitting: Cross validation helps to prevent overfitting by providing a more robust
estimate of the model’s performance on unseen data.
2. Model Selection: Cross validation can be used to compare different models and select the one
that performs the best on average.
3. Hyperparameter tuning: Cross validation can be used to optimize the hyperparameters of a
model, such as the regularization parameter, by selecting the values that result in the best
performance on the validation set.
4. Data Efficient: Cross validation allows the use of all the available data for both training and
validation, making it a more data-efficient method compared to traditional validation
techniques.
Disadvantages:
1. Computationally Expensive: Cross validation can be computationally expensive, especially when
the number of folds is large or when the model is complex and requires a long time to train.
2. Time-Consuming: Cross validation can be time-consuming, especially when there are many
hyperparameters to tune or when multiple models need to be compared.

❖ Random Forest Algorithm:

❖ Reinforcement learning:
• Reinforcement Learning is a feedback-based Machine learning technique in which an agent
learns to behave in an environment by performing the actions and seeing the results of actions.
For each good action, the agent gets positive feedback, and for each bad action, the agent gets
negative feedback or penalty.
• In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled
data, unlike supervised learning.
• Since there is no labeled data, so the agent is bound to learn by its experience only.
• RL solves a specific type of problem where decision making is sequential, and the goal is long-
term, such as game-playing, robotics, etc
• The agent interacts with the environment and explores it by itself. The primary goal of an agent
in reinforcement learning is to improve the performance by getting the maximum positive
rewards.

Key Features of Reinforcement Learning
• In RL, the agent is not instructed about the environment and what actions need to be taken.
• It is based on the hit and trial process.
• The agent takes the next action and changes states according to the feedback of the previous action.
• The agent may get a delayed reward.
• The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive
rewards.
Elements of Reinforcement Learning
1. Policy: Policy defines the learning agent behavior for given time period. It is a mapping from perceived
states of the environment to actions to be taken when in those states.
2. Reward function: Reward function is used to define a goal in a reinforcement learning problem.A
reward function is a function that provides a numerical score based on the state of the environment
3. Value function: Value functions specify what is good in the long run. The value of a state is the total
amount of reward an agent can expect to accumulate over the future, starting from that state.
4. Model of the environment: Models are used for planning.
Underfitting in Machine Learning:
A statistical model or a machine learning algorithm is said to have underfitting when a model is too simple to
capture data complexities. It represents the inability of the model to learn the training data effectively result
in poor performance both on the training and testing data. In simple terms, an underfit model’s are
inaccurate, especially when applied to new, unseen examples. It mainly happens when we uses very simple
model with overly simplified assumptions. To address underfitting problem of the model, we need to use more
complex models, with enhanced feature representation, and less regularization.
Reasons for Underfitting
1. The model is too simple, So it may be not capable to represent the complexities in the data.
2. The input features which is used to train the model is not the adequate representations of underlying
factors influencing the target variable.
3. The size of the training dataset used is not enough.
4. Excessive regularization are used to prevent the overfitting, which constraint the model to capture the
data well.
5. Features are not scaled.
Techniques to Reduce Underfitting
1. Increase model complexity.
2. Increase the number of features, performing feature engineering.
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training to get better results.
Overfitting in Machine Learning
A statistical model is said to be overfitted when the model does not make accurate predictions on testing
data. When a model gets trained with so much data, it starts learning from the noise and inaccurate data
entries in our data set. And when testing with test data results in High variance. Then the model does not
categorize the data correctly, because of too many details and noise. The causes of overfitting are the non-
parametric and non-linear methods because these types of machine learning algorithms have more freedom
in building the model based on the dataset and therefore they can really build unrealistic models. A solution to
avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal
depth if we are using decision trees.

Reasons for Overfitting:


1. High variance and low bias.
2. The model is too complex.
3. The size of the training data.
Techniques to Reduce Overfitting
1. Improving the quality of training data reduces overfitting by focusing on meaningful patterns, mitigate
the risk of fitting the noise or irrelevant features.
2. Increase the training data can improve the model’s ability to generalize to unseen data and reduce the
likelihood of overfitting.
3. Reduce model complexity.
4. Early stopping during the training phase (have an eye over the loss over the training period as soon as
loss begins to increase stop training).
5. Ridge Regularization and Lasso Regularization.
6. Use dropout for neural networks to tackle overfitting.

You might also like