Machine Learning Note - Exam Note For ML
Machine Learning Note - Exam Note For ML
A subset of artificial intelligence known as machine learning focuses primarily on the creation of algorithms
that enable a computer to independently learn from data and previous experiences. Arthur Samuel first used
the term "machine learning" in 1959.
Machine Learning is the field of study that gives computers the capability to learn without being explicitly
programmed.
Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns
within datasets, allowing them to make predictions on new, similar data without explicit programming for
each task.
The main idea behind bagging is to reduce the variance in a dataset, ensuring that the model is robust
and not influenced by specific samples in the dataset.
For this reason, bagging is mainly applied to tree-based machine learning models such as decision
trees and random forests.
Advantages:
a) Reduces overall variance
b) Increases models’ robustness to noise in the data
Disadvantages:
a) High number of weak models may reduce model interpretability
2. Boosting: Boosting sequentially trains models, each trying to correct the errors of the previous models.
Each subsequent model gives more weight to the data points that were misclassified or had higher
errors by the previous models.
The main idea behind sequential training is to have each model correct the errors of its predecessor.
This continues until the predefined number of trained models or some other criteria are met.
We first initialize data weights to the same value and then perform the following steps iteratively:
1. Train a model on all instances
2. Calculate the error on model output over all instances
3. Assign a weight to the model (high for good performance and vice-versa)
4. Update data weights: give higher weights to samples with high errors
5. Repeat the previous steps if the performance isn’t satisfactory or other stopping conditions are
met
Finally, we combine the models into the one we use for prediction.
Boosting generally improves the accuracy of a machine learning model by improving the performance
of weak learners. We typically use XGBoost, CatBoost, and AdaBoost.
Advantages:
a) Improves overall accuracy
b) Reduces overall bias by improving on the weakness of the previous model
Disadvantages:
a) Can be computationally expensive
b) Sensitive to noisy data
c) Model dependency may allow for replication of errors
3. Stacking (Stacked Generalization): Stacking involves training multiple base models and then training a
meta-model (also called a second-level model) to combine their predictions. The base models'
predictions are used as inputs to the meta-model. The base and meta-models don’t have to be of the
same type. For example, we can pair a decision tree with a support vector machine (SVM).
Here are the steps:
• Construct base models on different portions of the training data
• Train a meta-model on the predictions from the base models
Example: A common approach is to use different types of models (e.g., logistic regression, decision
trees, SVMs) as base models and a linear model or a more complex model as the meta-model.
Advantages:
a) Combines the benefits of different models into one
b) Increases overall accuracy
Disadvantages:
a) May take a longer time to train and aggregate the predictions of different types of models
b) Training several base models and a meta-model increases complexity
4. Voting: This is a simple ensemble technique where multiple models are trained, and their predictions
are combined by taking a majority vote (for classification problems) or averaging (for regression
problems). Different models can be weighted differently based on their individual performance.
Examples:
• Hard Voting: For classification, takes the majority class predicted by the ensemble.
• Soft Voting: For classification, averages the predicted probabilities and selects the class with
the highest average probability.
5. Blending: Blending is similar to stacking, but instead of using a meta-model to combine the base
models' predictions, it uses a simpler technique like averaging or a weighted average based on the
performance of each base model on a holdout set.
Q. Discuss the impact of feature selection on the performance of machine learning models.
How can different feature selection techniques affect model accuracy and efficiency?
Feature selection plays a crucial role in the performance of machine learning models, as it can significantly
impact model accuracy, efficiency, and interpretability. Choosing the right set of features is essential because
irrelevant or redundant features can introduce noise and complexity, leading to overfitting, increased
computational cost, and poor generalization to new data.
Different feature selection techniques can have varying effects on model performance:
1. Filter Methods: Filter methods are computationally efficient and provide a good initial set of features.
However, they do not consider interactions between features, which may limit their effectiveness in
some cases.
2. Wrapper Methods: These methods often yield better performance because they consider feature
interactions and model performance. However, they are computationally expensive, especially with
large datasets and feature sets.
3. Embedded Methods: Embedded methods balance efficiency and performance by integrating feature
selection into the learning algorithm. They are particularly useful for high-dimensional datasets.
4. Dimensionality Reduction Techniques: These techniques can effectively reduce the feature space while
retaining most of the relevant information. However, they can make the resulting features less
interpretable.
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and
Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify the shape.
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:
1. Regression: Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather forecasting,
Market Trends, etc.
Some popular Regression algorithms which come under supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
2. Classification: Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.
Example: Spam email filtering, Rainfall tomorrow or not etc
Some popular Classification algorithms which come under supervised learning:
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of
different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it does
not have any idea about the features of the dataset. The task of the unsupervised learning algorithm is to
identify the image features on their own. Unsupervised learning algorithm will perform this task by clustering
the image dataset into the groups according to similarities between images.
Why use Unsupervised Learning?
Some main reasons which describe the importance of Unsupervised Learning:
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own experiences, which makes
it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning
more important.
o In real-world, we do not always have input data with the corresponding output so to solve such cases,
we need unsupervised learning.
Examples: (Supervised)
• Email Spam Detection: Using historical email data labeled as 'spam' or 'not spam' to train a model to
classify new emails.
• Image Recognition: Training a model with labeled images to recognize objects (e.g., cats vs. dogs).
• House Price Prediction: Using historical data on house prices and features (e.g., size, location) to
predict the price of new houses.
Examples: (Unsupervised)
• Document Clustering: Organizing a collection of documents into clusters based on content similarity.
• Principal Component Analysis (PCA): Reducing the number of features in a dataset while retaining
most of the variance, often used for visualization and noise reduction.
• t-Distributed Stochastic Neighbor Embedding (t-SNE): Visualizing high-dimensional data by reducing
it to two or three dimensions.
Semi-supervised Learning Reinforcement Learning
1. It is a combination of supervised and 1. A different learning approach based on an
unsupervised learning agent interacting with an environment
2. Semi-Supervised Learning uses a small 2. Reinforcement Learning Uses a system of
amount of labeled data along with a large rewards and penalties to learn.
amount of unlabeled data.
3. It Utilizes labeled data to guide learning, 3. It is not dependent on labeled data, relies on
unlabeled data to enhance accuracy. reward signals from the environment
4. Its goal is to improve learning accuracy with 4. Its goal is to learn a policy that maximizes a
limited labeled data. cumulative reward signal.
5. First, a model is trained on the labeled data. 5. The agent takes actions in an environment,
Then, the model is further refined using the receives rewards or penalties, and updates its
unlabeled data. policy based on the feedback to maximize the
expected cumulative reward.
6. Useful when labeling data is expensive or 6. Suitable for sequential decision-making
time-consuming, such as in image problems, such as game playing, robotics, and
classification, natural language processing, and autonomous systems.
speech recognition.
❖ Regression:
Regression analysis is a statistical method to model the relationship between a dependent (target) and
independent (predictor) variables with one or more independent variables. More specifically, Regression
analysis helps us to understand how the value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed. It predicts continuous/real values such
as temperature, age, salary, price, etc.
Regression is a supervised learning technique which helps in finding the correlation between variables and
enables us to predict the continuous output variable based on the one or more predictor variables. It is mainly
used for prediction, forecasting, time series modeling, and determining the causal-effect relationship
between variables.
Regression shows a line or curve that passes through all the datapoints on target-predictor graph in such a
way that the vertical distance between the datapoints and the regression line is minimum. The distance
between datapoints and line tells whether a model has captured a strong relationship or not.
Types of Regression:
1. Linear Regression:
• Linear regression is a statistical regression method which is used for predictive analysis.
• It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable (X-axis) and
the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
• The relationship between variables in the linear regression model can be explained using the
below image. Here we are predicting the salary of an employee on the basis of the year of
experience.
When we provide the input values (data) to the function, it gives the S-curve as follows:
It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values
below the threshold level are rounded up to 0.
3. Polynomial Regression:
• Polynomial Regression is a type of regression which models the non-linear dataset using a
linear model.
• It is similar to multiple linear regression, but it fits a non-linear curve between the value of x
and corresponding conditional values of y.
• Suppose there is a dataset which consists of datapoints which are present in a non-linear
fashion, so for such case, linear regression will not best fit to those datapoints. To cover such
datapoints, we need Polynomial regression.
• In Polynomial regression, the original features are transformed into polynomial features of
given degree and then modeled using a linear model. Which means the datapoints are best
fitted using a polynomial line.
• The equation for polynomial regression also derived from linear regression equation that
means Linear regression equation Y= b0+ b1x, is transformed into Polynomial regression
equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
• Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is
our independent/input variable.
• The model is still linear as the coefficients are still linear with quadratic
Applications:
• Data preprocessing: The KNN algorithm is used for the process of missing data imputation that
estimates the missing values
• Pattern recognition: The KNN algorithm is useful in identifying patterns in customer purchase
behavior.
• Stock price prediction: The KNN algorithm is useful in predicting the future value of stocks based on
historical data.
• Recommendation system: The KNN algorithm can be used in an online video streaming platform to
suggest content a user is more likely to watch by analyzing what similar users watch.
Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data points for all the
training samples.
Bayes' Theorem:
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability of
a hypothesis with prior knowledge. It depends on the conditional probability.
• The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: the probability of event A occurring, given event B has occurred.
P(B|A) is Likelihood probability: the probability of event B occurring, given event A has occurred.
P(A) is Prior Probability: the probability of event A.
P(B) is Marginal Probability: the probability of event B.
❖ SVM:
• Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional
space that can separate the data points in different classes in the feature space.
• The hyperplane tries that the margin between the closest points of different classes should be as
maximum as possible.
• The dimension of the hyperplane depends upon the number of features. If the number of input
features is two, then the hyperplane is just a line. If the number of input features is three, then
the hyperplane becomes a 2-D plane.
Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it
is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with
lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test
it with this strange creature. So as support vector creates a decision boundary between these two data (cat
and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis
of the support vectors, it will classify it as a cat.
Types of SVM:
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.
Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of different
classes in a feature space.
Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a critical role
in deciding the hyperplane and margin.
Advantages of SVM
• Effective in High Dimensions: Works well with high-dimensional data.
• Memory Efficient: Uses a subset of training points (support vectors) in the decision function.
• Versatile: Different kernel functions can be specified for the decision function, making SVM adaptable to
different data structures.
Disadvantages of SVM
• Computationally Intensive: Training can be slow, especially with large datasets.
• Parameter Tuning: Performance depends on the choice of kernel and parameters (like C and ϒ).
• Not Probabilistic: Does not provide direct probability estimates but can be calibrated using methods like
Platt scaling.
❖ Cross Validation:
• Cross validation is a technique used in machine learning to evaluate the performance of a model
on unseen data. It involves dividing the available data into multiple folds or subsets, using one of
these folds as a validation set, and training the model on the remaining folds. This process is
repeated multiple times, each time using a different fold as the validation set. Finally, the results
from each validation step are averaged to produce a more robust estimate of the model’s
performance.
• The primary purpose of cross-validation is to ensure that the model performs well on unseen
data, thereby avoiding overfitting and underfitting.
The basic steps of cross-validations are:
• Reserve a subset of the dataset as a validation set.
• Provide the training to the model using the training dataset.
• Now, evaluate model performance using the validation set. If the model performs well with the
validation set, perform the further step, else check for the issues.
Types of Cross-Validation
1. Holdout Validation: In Holdout Validation, we perform training on the 50% of the given dataset
and rest 50% is used for the testing purpose. It’s a simple and quick way to evaluate a model. The
major drawback of this method is that we perform training on the 50% of the dataset, it may
possible that the remaining 50% of the data contains some important information which we are
leaving while training our model i.e. higher bias.
2. LOOCV (Leave One Out Cross Validation): In this method, we perform training on the whole
dataset but leaves only one data-point of the available dataset and then iterates for each data-
point. In LOOCV, the model is trained on (n – 1) samples and tested on the one omitted sample,
repeating this process for each data point in the dataset.
3. K-Fold Cross Validation: In K-Fold Cross Validation, we split the dataset into k number of subsets
(known as folds) then we perform training on the all the subsets but leave one(k-1) subset for the
evaluation of the trained model. In this method, we iterate k times with a different subset
reserved for testing purpose each time.
4. Stratified K-Fold Cross-Validation: Similar to K-Fold but ensures that each fold has
approximately the same percentage of samples of each target class. This is especially useful for
imbalanced datasets.
Advantages:
1. Overcoming Overfitting: Cross validation helps to prevent overfitting by providing a more robust
estimate of the model’s performance on unseen data.
2. Model Selection: Cross validation can be used to compare different models and select the one
that performs the best on average.
3. Hyperparameter tuning: Cross validation can be used to optimize the hyperparameters of a
model, such as the regularization parameter, by selecting the values that result in the best
performance on the validation set.
4. Data Efficient: Cross validation allows the use of all the available data for both training and
validation, making it a more data-efficient method compared to traditional validation
techniques.
Disadvantages:
1. Computationally Expensive: Cross validation can be computationally expensive, especially when
the number of folds is large or when the model is complex and requires a long time to train.
2. Time-Consuming: Cross validation can be time-consuming, especially when there are many
hyperparameters to tune or when multiple models need to be compared.
❖ Reinforcement learning:
• Reinforcement Learning is a feedback-based Machine learning technique in which an agent
learns to behave in an environment by performing the actions and seeing the results of actions.
For each good action, the agent gets positive feedback, and for each bad action, the agent gets
negative feedback or penalty.
• In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled
data, unlike supervised learning.
• Since there is no labeled data, so the agent is bound to learn by its experience only.
• RL solves a specific type of problem where decision making is sequential, and the goal is long-
term, such as game-playing, robotics, etc
• The agent interacts with the environment and explores it by itself. The primary goal of an agent
in reinforcement learning is to improve the performance by getting the maximum positive
rewards.
•
Key Features of Reinforcement Learning
• In RL, the agent is not instructed about the environment and what actions need to be taken.
• It is based on the hit and trial process.
• The agent takes the next action and changes states according to the feedback of the previous action.
• The agent may get a delayed reward.
• The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive
rewards.
Elements of Reinforcement Learning
1. Policy: Policy defines the learning agent behavior for given time period. It is a mapping from perceived
states of the environment to actions to be taken when in those states.
2. Reward function: Reward function is used to define a goal in a reinforcement learning problem.A
reward function is a function that provides a numerical score based on the state of the environment
3. Value function: Value functions specify what is good in the long run. The value of a state is the total
amount of reward an agent can expect to accumulate over the future, starting from that state.
4. Model of the environment: Models are used for planning.
Underfitting in Machine Learning:
A statistical model or a machine learning algorithm is said to have underfitting when a model is too simple to
capture data complexities. It represents the inability of the model to learn the training data effectively result
in poor performance both on the training and testing data. In simple terms, an underfit model’s are
inaccurate, especially when applied to new, unseen examples. It mainly happens when we uses very simple
model with overly simplified assumptions. To address underfitting problem of the model, we need to use more
complex models, with enhanced feature representation, and less regularization.
Reasons for Underfitting
1. The model is too simple, So it may be not capable to represent the complexities in the data.
2. The input features which is used to train the model is not the adequate representations of underlying
factors influencing the target variable.
3. The size of the training dataset used is not enough.
4. Excessive regularization are used to prevent the overfitting, which constraint the model to capture the
data well.
5. Features are not scaled.
Techniques to Reduce Underfitting
1. Increase model complexity.
2. Increase the number of features, performing feature engineering.
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training to get better results.
Overfitting in Machine Learning
A statistical model is said to be overfitted when the model does not make accurate predictions on testing
data. When a model gets trained with so much data, it starts learning from the noise and inaccurate data
entries in our data set. And when testing with test data results in High variance. Then the model does not
categorize the data correctly, because of too many details and noise. The causes of overfitting are the non-
parametric and non-linear methods because these types of machine learning algorithms have more freedom
in building the model based on the dataset and therefore they can really build unrealistic models. A solution to
avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal
depth if we are using decision trees.