Machine Learning 2
Machine Learning 2
Learning
D R. S O U M I D U T TA
Advantages of Machine Learning
Automation − With machine learning, every task especially repetitive can be done seamlessly saving time and energy for
humans. For example, the deployment of chatbots has improved customer experience and reduced waiting time. While human
agents can work on dealing with creativity and complex problems.
Enhancing user experience and decision making − Machine learning models can analyze and gain insights from large
datasets for decision making. Machine learning also allows for the personalization of products and services to enhance the
customer experience. An algorithm analyzes customer preferences and past behavior to recommend products that enhance
retail and also user experience.
Wide Applicability − This technology has wide range of applications. From health care and finance to business and
marketing, machine learning is applied in almost all sectors to improve productivity.
Continuous Improvement − Machine learning algorithms are designed in a way that they keep learning to improve
accuracy and efficiency. Every time the data is retrained by the model, the decisions improve.
Disadvantages of Machine Learning
Data acquisition − The most crucial and the most difficult task in machine learning is collecting data. Every
machine learning algorithm requires data that is relevant, unbiased, and good quality. Better data would result in
better performance of the machine learning model.
Inaccurate Results − Another major challenge in machine learning is the credibility of the interpreted result
generated by the algorithm.
Chances of Error − Machine learning depends on two things data and algorithm. Any incorrectness or bias in these
could result in errors and inaccurate outcomes. For example, if the dataset trained is small, then the algorithm
cannot fully understand the patterns resulting in biased and irrelevant perdition.
Maintenance − Machine learning models have to continuously be maintained and monitored to ensure that they
remain effective and accurate over time.
Machine Learning
Algorithms Vs.
Traditional Programming
What is a dataset?
A dataset is a collection of data in which data is arranged in some order. A dataset can contain any data from a
series of an array to a database table. Below table shows an example of the dataset:
Ordinal data: These data are similar to categorical data but can be measured on the
basis of comparison.
Note: A real-world dataset is of huge size, which is difficult to manage and process at the
initial level. Therefore, to practice machine learning algorithms, we can use any dummy
dataset.
Types of datasets
Machine learning incorporates different domains, each requiring explicit sorts of datasets. A few normal sorts of
datasets utilized in machine learning include:
Image Datasets: Image datasets contain an assortment of images and are normally utilized in computer vision
tasks such as image classification, object detection, and image segmentation.
Examples :
◦ ImageNet
◦ CIFAR-10
◦ MNIST
Types of datasets
Text Datasets:
Text datasets comprise textual information, like articles, books, or virtual entertainment posts. These datasets are
utilized in NLP techniques like sentiment analysis, text classification, and machine translation.
Examples :
◦ Gutenberg Task dataset
Time series datasets include information focuses gathered after some time. They are generally utilized in
determining, abnormality location, and pattern examination.
Examples :
◦ Securities exchange information
◦ Climate information
◦ Sensor readings.
Types of datasets
Tabular Datasets:
Tabular datasets are organized information coordinated in tables or calculation sheets. They contain lines addressing
examples or tests and segments addressing highlights or qualities. Tabular datasets are utilized for undertakings like
relapse and arrangement. The dataset given before in the article is an illustration of a tabular dataset.
Need of Dataset
Completely ready and pre-handled datasets are significant for machine learning projects.
They give the establishment to prepare exact and solid models. Notwithstanding, working with
enormous datasets can introduce difficulties regarding the board and handling.
To address these difficulties, productive information the executive's strategies and are expected to
handle calculations.
Data Pre-processing
Data pre-processing is a fundamental stage in preparing datasets for machine learning. It includes
changing raw data into a configuration reasonable for model training. Normal pre-processing
procedures incorporate data cleaning to eliminate irregularities or blunders, standardization to scale
data inside a particular reach, highlight scaling to guarantee highlights have comparative ranges, and
taking care of missing qualities through ascription or evacuation.
During the development of the ML project, the developers completely rely on the datasets. In building
ML applications, datasets are divided into two parts:
Training dataset
Test Dataset
Training Dataset and Test Dataset:
In machine learning, datasets are ordinarily partitioned into two sections: the training dataset and the
test dataset. The training dataset is utilized to prepare the machine learning model, while the test
dataset is utilized to assess the model's exhibition. This division surveys the model's capacity, to sum
up to inconspicuous data. It is fundamental to guarantee that the datasets are representative of the
issue space and appropriately split to stay away from inclination or overfitting.
Data Pre-processing
Popular sources for Machine Learning
datasets
Kaggle is one of the best sources for providing
datasets for Data Scientists and Machine Learners.
It allows users to find, download, and publish
datasets in an easy way. It also provides the
opportunity to work with other machine learning
engineers and solve difficult Data Science related
tasks.
The goal of providing these datasets is to increase transparency of government work among the people and
to use the data in an innovative approach. Below are some links of government datasets:
• Indian Government dataset
• US Government Dataset
Machine learning models can be categorized mainly into the following four types −
a) Linear Regression
b) Logistic Regression
c) Decision Trees
d) Random Forest
e) K-nearest Neighbor
g) Naive Bayes
i) Neural Networks
Unsupervised Learning
Unsupervised learning is a type of Machine learning that uses unlabelled dataset to discover patterns without any explicit guidance
or instruction. For example, customer segmentation i.e, dividing a company's customers into groups that reflect similarity. Further,
we can classify the unsupervised learning algorithms into three types − clustering, association, and dimensionality reduction.
Followings are some commonly used unsupervised learning algorithms −
a) K-Means Clustering
c) Hierarchical Clustering
d) DBSCAN Clustering
e) Agglomerative Clustering
f) Apriori Algorithm
g) Autoencoder
a) Q-learning
c) SARSA
d) DQN
e) DDPG
Few Important Terminology
a) Data
b) Feature
c) Model
d) Training
e) Testing
f) Overfitting
g) Underfitting
Data
Data is the foundation of machine learning. Without data, there would be nothing for the
algorithm to learn from. Data can come in many forms, including structured data (such as
spreadsheets and databases) and unstructured data (such as text and images). The quality
and quantity of the data used to train the machine learning algorithm are crucial factors that
The goal is to select the most relevant and informative features that will allow the algorithm
to make accurate predictions or decisions. Feature selection is a crucial step in the machine
learning process because the performance of the algorithm is heavily dependent on the
input data (features) and the output (predictions or decisions). The model is created using a
training dataset and then evaluated using a separate validation dataset. The goal is to create
predictions or decisions. This is done by providing the algorithm with a large dataset and
allowing it to learn from the patterns and relationships in the data. During training, the
algorithm adjusts its internal parameters to minimize the difference between its predicted
algorithm on a separate dataset that it has not seen before. The goal is to
determine how well the algorithm generalizes to new, unseen data. If the
model.
Overfitting
Overfitting occurs when a machine learning model is too complex and fits the training
data too closely. This can lead to poor performance on new, unseen data because the
It is important to note that preventing underfitting is a balancing act between model complexity
and the amount of data available. Increasing model complexity can help prevent underfitting, but
if there is not enough data to support the increased complexity, overfitting may occur instead.
Therefore, it is important to monitor the model's performance and adjust the complexity as
necessary.
Machine Learning Vs. Deep Learning
Deep learning is a sub-field of Machine learning. The actual difference between these is the way
the algorithm learns.
In Machine learning, computers learn from large datasets using algorithms to perform tasks like
prediction and recommendation. Whereas Deep learning uses a complex structure of algorithms
developed similar to the human brain.
The effectiveness of deep learning models for complex problems is more compared to machine
learning models. For example, autonomous vehicles are usually developed using deep learning
where it can identify a U-TURN sign board using image segmentation while if a machine learning
model was used, the features of the signboard are selected and then identified using a classifier
algorithm.
Machine Learning Vs. Generative AI
Machine learning and Generative AI are different branches with different applications. While
Machine Learning is used for predictive analysis and decision-making, Generative AI focuses on
creating content, including realistic images and videos in existing patterns.