Machine Learning
Machine Learning
Machine Learning
(MCA - 36)
Practical File
2021-22
3. Classification Algorithms 9 - 13
4. Neural Networks 14 - 16
5. Ranking 17 - 18
6. Feature Selection 19 - 21
1
1. Linear regression and logistic regression
Types of Regression:
There are various types of regressions which are used in data science and machine learning.
Each type has its own importance on different scenarios, but at the core, all the regression
methods analyze the effect of the independent variable on dependent variables. Here we are
discussing some important types of regression which are given below:
● Linear Regression
● Logistic Regression
● Polynomial Regression
● Support Vector Regression
● Decision Tree Regression
● Random Forest Regression
● Ridge Regression
● Lasso Regression
1. Linear Regression:
2. Logistic Regression:
● Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
● Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
● It is a predictive analysis algorithm which works on the concept of probability.
3
● Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
● Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The
function can be represented as:
When we provide the input values (data) to the function, it gives the S-curve as
follows:
● It uses the concept of threshold levels, values above the threshold level are rounded up
to 1, and values below the threshold level are rounded up to 0.
4
2.Clustering algorithms
Clustering:
Clustering is a method of grouping the objects into clusters such that objects with most
similarities remain into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes
them as per the presence and absence of those commonalities.
1. Partitioning Clustering
In this type of clustering, the algorithm subdivides the data into a subset of k groups. These k
groups or clusters are to be pre-defined. It divides the data into clusters by satisfying these
two requirements – Firstly, Each group should consist of at least one point. Secondly, each
point must belong to exactly one group. K-Means Clustering is the most popular type of
partitioning clustering method.
5
2. Hierarchical Clustering
The basic notion behind this type of clustering is to create a hierarchy of clusters. As opposed
to Partitioning Clustering, it does not require pre-definition of clusters upon which the
model is to be built. There are two ways to perform Hierarchical Clustering. The first
approach is a bottom-up approach, also known as Agglomerative Approach and the second
approach is the Divisive Approach which moves hierarchy of clusters in a top-down
approach. As a result of this type of clustering, we obtain a tree-like representation known as
a dendogram.
3. Density-Based Models
In these type of clusters, there are dense areas present in the data space that are separated
from each other by sparser areas. These type of clustering algorithms play a crucial role in
evaluating and finding non-linear shape structures based on density. The most popular
density-based algorithm is DBSCAn which allows spatial clustering of data with noise. It
makes use of two concepts – Data Reachability and Data Connectivity.
6
4. Model-Based Clustering
In this type of clustering technique, the data observed arises from a distribution consisting
of a mixture of two or more cluster components. Furthermore, each component cluster has a
density function having an associated probability or weight in this mixture.
5. Fuzzy Clustering
7
In this type of clustering, the data points can belong to more than one cluster. Each
component present in the cluster has a membership coefficient that corresponds to a degree
of being present in that cluster. Fuzzy Clustering method is also known as a soft method of
clustering.
3. Classification Algorithms
8
Below are five of the most common algorithms in machine learning.
● Logistic Regression
● Naive Bayes
● K-Nearest Neighbors
● Decision Tree
● Support Vector Machines
Logistic Regression:
Independent variables are analyzed to determine the binary outcome with the results falling
into one of two categories. The independent variables can be categorical or numeric, but the
dependent variable is always categorical. Written like this:
P(Y=1|X) or P(Y=0|X)
This can be used to calculate the probability of a word having a positive or negative
connotation (0, 1, or on a scale between). Or it can be used to determine the object contained
in a photo (tree, flower, grass, etc.), with each object given a probability between 0 and 1.
Naive Bayes:
Naive Bayes calculates the possibility of whether a data point belongs within a certain
category or does not. In text analysis, it can be used to categorize words or phrases as
belonging to a preset “tag” (classification) or not. For example:
9
To decide whether or not a phrase should be tagged as “sports,” you need to calculate:
K-nearest Neighbors:
K-nearest neighbors (k-NN) is a pattern recognition algorithm that uses training datasets to
find the k closest relatives in future examples.
When k-NN is used in classification, you calculate to place data within the category of its
nearest neighbor. If k = 1, then it would be placed in the class nearest 1. K is classified by a
plurality poll of its neighbors.
Decision Tree:
A decision tree is a supervised learning algorithm that is perfect for classification problems,
as it’s able to order classes on a precise level. It works like a flow chart, separating data
points into two similar categories at a time from the “tree trunk” to “branches,” to
“leaves,” where the categories become more finitely similar. This creates categories within
categories, allowing for organic classification with limited human supervision.
10
To continue with the sports example, this is how the decision tree works:
Random Forest
The random forest algorithm is an expansion of decision tree, in that you first construct a
multitude of decision trees with training data, then fit your new data within one of the trees
as a “random forest.”
It, essentially, averages your data to connect it to the nearest tree on the data scale. Random
forest models are helpful as they remedy for the decision tree’s problem of “forcing” data
points within a category unnecessarily.
A support vector machine (SVM) uses algorithms to train and classify data within degrees of
polarity, taking it to a degree beyond X/Y prediction.
For a simple visual explanation, we’ll use two tags: red and blue, with two data features: X
and Y, then train our classifier to output an X/Y coordinate as either red or blue.
11
The SVM then assigns a hyperplane that best separates the tags. In two dimensions this is
simply a line. Anything on one side of the line is red and anything on the other side is blue. In
sentiment analysis, for example, this would be positive and negative.
In order to maximize machine learning, the best hyperplane is the one with the largest
distance between each tag:
12
However, as data sets become more complex, it may not be possible to draw a single line to
classify the data into two camps:
Using SVM, the more complex the data, the more accurate the predictor will become.
Imagine the above in three dimensions, with a Z-axis added, so it becomes a circle.
Mapped back to two dimensions with the best hyperplane, it looks like this:
SVM allows for more accurate machine learning because it’s multidimensional.
13
4. Neural Networks
Neural networks, also known as artificial neural networks (ANNs) or simulated neural
networks (SNNs), are a subset of machine learning and are at the heart of deep learning
algorithms. Their name and structure are inspired by the human brain, mimicking the way
that biological neurons signal to one another.
Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer,
one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to
another and has an associated weight and threshold. If the output of any individual node is
above the specified threshold value, that node is activated, sending data to the next layer of
the network. Otherwise, no data is passed along to the next layer of the network.
Neural networks rely on training data to learn and improve their accuracy over time.
However, once these learning algorithms are fine-tuned for accuracy, they are powerful
tools in computer science and artificial intelligence, allowing us to classify and cluster data
at a high velocity. Tasks in speech recognition or image recognition can take minutes versus
hours when compared to the manual identification by human experts. One of the most well-
known neural networks is Google’s search algorithm.
14
Neural networks have several use cases across many industries, such as the following:
Computer vision
Computer vision is the ability of computers to extract information and insights from images
and videos. With neural networks, computers can distinguish and recognize images similar
to humans. Computer vision has several applications, such as the following:
● Visual recognition in self-driving cars so they can recognize road signs and other road
users
● Content moderation to automatically remove unsafe or inappropriate content from
image and video archives
● Facial recognition to identify faces and recognize attributes like open eyes, glasses,
and facial hair
● Image labeling to identify brand logos, clothing, safety gear, and other image details
Speech recognition
Neural networks can analyze human speech despite varying speech patterns, pitch, tone,
language, and accent. Virtual assistants like Amazon Alexa and automatic transcription
software use speech recognition to do tasks like these:
Natural language processing (NLP) is the ability to process natural, human-created text.
Neural networks help computers gather insights and meaning from text data and
documents. NLP has several use cases, including in these functions:
Recommendation engines
Neural networks can track user activity to develop personalized recommendations. They can
also analyze all user behavior and discover new products or services that interest a specific
user. For example, Curalate, a Philadelphia-based startup, helps brands convert social media
posts into sales. Brands use Curalate’s intelligent product tagging (IPT) service to automate
the collection and curation of user-generated social content. IPT uses neural networks to
automatically find and recommend products relevant to the user’s social media activity.
Consumers don't have to hunt through online catalogs to find a specific product from a
social media image. Instead, they can use Curalate’s auto product tagging to purchase the
product with ease.
16
5. Ranking
Ranking:
Ranking is a machine learning technique to rank items.
Ranking is useful for many applications in information retrieval such as e-commerce, social
networks, recommendation systems, and so on. For example, a user searches for an article or
an item to buy online. To build a recommendation system, it becomes important that similar
articles or items of relevance appear to the user such that the user clicks or purchases the
item. A simple regression model can predict the probability of a user to click an article or buy
an item. However, it is more practical to use ranking technique and be able to order or rank
the articles or items to maximize the chances of getting a click or purchase. The
prioritization of the articles or the items influence the decision of the users.
The ranking technique directly ranks items by training a model to predict the ranking of one
item over another item. In the training model, it is possible to have items, ranking one over
the other by having a "score" for each item. Higher ranked items have higher scores and
lower ranked items have lower scores. Using these scores, a model is built to predict which
item ranks higher than the other.
● Output – For a query-document input xᵢ = (q, dᵢ), we assume there exists a true
All Learning to Rank models use a base Machine Learning model (e.g. Decision Tree or Neural
Network) to compute s = f(x). The choice of the loss function is the distinctive element for
Learning to Rank models. In general, we have 3 approaches, depending on how the loss is
computed.
17
● Pointwise Methods – The total loss is computed as the sum of loss terms defined
on each document dᵢ (hence pointwise) as the distance between the predicted score sᵢ and
the ground truth yᵢ, for i=1…n. By doing this, we transform our task into a regression
● Pairwise Methods – The total loss is computed as the sum of loss terms defined on
each pair of documents dᵢ, dⱼ (hence pairwise) , for i, j=1…n. The objective on which the
model is trained is to predict whether yᵢ > yⱼ or not, i.e. which of two documents is more
relevant. By doing this, we transform our task into a binary classification problem.
● Listwise Methods – The loss is directly computed on the whole list of documents
(hence listwise) with corresponding predicted ranks. In this way, ranking metrics can
be more directly incorporated into the loss.
18
6. Feature Selection
Feature Selection:
Feature Selection is the method of reducing the input variable to your model by using only
relevant data and getting rid of noise in data.
It is the process of automatically choosing relevant features for your machine learning
model based on the type of problem you are trying to solve. We do this by including or
excluding important features without changing them. It helps in cutting down the noise in
our data and reducing the size of our input data.
● Supervised Models: Supervised feature selection refers to the method which uses
the output label class for feature selection. They use the target variables to identify the
variables which can increase the efficiency of the model
19
We can further divide the supervised models into three :
1. Filter Method: In this method, features are dropped based on their relation to the output,
or how they are correlating to the output. We use correlation to check if the features are
positively or negatively correlated to the output labels and drop features accordingly. Eg:
Information Gain, Chi-Square Test, Fisher’s Score, etc.
2. Wrapper Method: We split our data into subsets and train a model using this. Based on the
output of the model, we add and subtract features and train the model again. It forms the
subsets using a greedy approach and evaluates the accuracy of all the possible combinations
of features. Eg: Forward Selection, Backwards Elimination, etc.
20
3. Intrinsic Method: This method combines the qualities of both the Filter and Wrapper
method to create the best subset.
This method takes care of the machine training iterative process while maintaining the
computation cost to be minimum. Eg: Lasso and Ridge Regression.
21