MLT Unit 1
MLT Unit 1
Terminology:
In supervised learning models are trained using labelled dataset where the
model learns about each type of data. Once the training process is
completed, the model is tested based on test data and then it predicts the
output.
Suppose we have a dataset of different type of shape includes square,
rectangle, triangle, and polygon. Now the first step is that we need to train
the model for each shape.
If the given shape has four sides, and all the sides are equal, then it will
be labelled as a Square.
If the given shape has three sides, then it will be labelled as a triangle.
If the given shape has six equal sides, then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of
the model is to identify the shape.
The machine is already trained on all types of shapes, and when it finds a
new shape, it classifies the shape on the bases of a number of sides, and
predicts the output.
1.Regression
Linear Regression
Regression Trees
Non-Linear Regression
Bayesian Linear Regression
Polynomial Regression
2. Classification
Random Forest
Decision Trees
Logistic Regression
Support vector Machines
With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
In supervised learning, we can have an exact idea about the classes of
objects.
Supervised learning model helps us to solve various real-world problems
such as fraud detection, spam filtering, etc.
Supervised learning models are not suitable for handling the complex
tasks.
Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
Training required lots of computation times.
In supervised learning, we need enough knowledge about the classes of
object.
Unsupervised learning is helpful for finding useful insights from the data.
Unsupervised learning is much similar as a human learns to think by their
own experiences, which makes it closer to the real AI.
Unsupervised learning works on unlabelled and uncategorized data which
make unsupervised learning more important.
In real-world, we do not always have input data with the corresponding
output so to solve such cases, we need unsupervised learning.
2.Association
K-means clustering
KNN (k-nearest neighbors)
Hierarchal clustering
Anomaly detection
Neural Networks
Principle Component Analysis
Independent Component Analysis
Apriori algorithm
Singular value decomposition
3.Reinforcement Learning
Types of RL
Task
Performance Measure
Experience
1. Image Recognition:
2. Speech Recognition
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us
the correct path with the shortest route and predicts the traffic conditions.
Real Time location of the vehicle form Google Map app and sensors
Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the
performance.
The major issue that comes while using machine learning algorithms is
the lack of quality as well as quantity of data. Although data plays a vital
role in the processing of machine learning algorithms, many data
scientists claim that inadequate data, noisy data, and unclean data are
extremely exhausting the machine learning algorithms.
For example, a simple task requires thousands of sample data, and an
advanced task such as speech or image recognition needs millions of
sample data examples.
Further, data quality is also important for the algorithms to work ideally,
but the absence of data quality is also found in Machine Learning
applications. Data quality can be affected by some factors as follows:
1.5-Types of Data
Almost anything can be turned into Data. Building a deep understanding of the
different data types is a crucial condition for doing a Exploratory Data Analysis
(EDA) and Feature engineering for the machine learning models.
Numerical Data-This are the numbers and can be split into two categories.
A-Discrete Data
Numbers that are limited to integers. Example: - The number of cars passing by.
This type of variable value if expressed in decimal format will have no proper
meaning. Their values can be counted.
B-Continuous Data
Numbers that are of infinite value. Example: - The price of item, size of item.
The numerical measures which can take the value within a certain range. This
type of variable value if expressed in decimal format has true meaning.
These are the data types that cannot be expressed in numbers. This describes
categories or groups and hence known as the categorical data types.
A-Categorical Data
This are the values that cannot be measured up against each other. Example: -
Colour value or any Yes/No values.
B-Structured Data
This type of data is either numbers or words. This can take numerical values but
mathematical operations cannot be performed on it. This type of data is
expressed in tabular format.
C-Unstructured Data
This type of data does not have proper format and thus known as unstructured
data. This comprises textual data, sounds, images etc.
D-Ordinal Data
This are like categorical data, but can be measured up against each other.
E-Nominal Data
This is not a measurable data. This data type also belongs to categorical type.
1.6-Data remediation
Data remediation is the process of cleansing, organizing, and migrating
data so that it’s properly protected and best serves its intended purpose.
There is a misconception that data remediation simply means deleting
business data that is no longer needed.
It is important to remember that the key word “remediation” derives from
the word “remedy,” which is to correct a mistake. Since the core initiative
is to correct data, the data remediation process typically involves
replacing, modifying, cleansing, or deleting any “dirty” data.
Data remediation terminology
As you explore the data remediation process, you will come across unique
terminology. These are common terms related to data remediation that you
should get acquainted with.