Module 5
Module 5
Perceptron
A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm
enables neurons to learn and processes elements in the training set one at a time.
Types of Perceptron
Single layer: Single layer perceptron can learn only linearly separable patterns.
Multilayer: Multilayer perceptrons can learn about two or more layers having a greater
processing power.
A perceptron model is also classified as one of the best and most specific types of Artificial
Neural networks. Being a supervised learning algorithm of binary classifiers, we can also
consider it a single-layer neural network with four main parameters: input values, weights
and Bias, net sum, and an activation function.
This step function or Activation function is vital in ensuring that output is mapped between
(0,1) or (-1,1). Take note that the weight of input indicates a node’s strength. Similarly, an
input value gives the ability the shift the activation function curve up or down.
Step 1: Multiply all input values with corresponding weight values and then add to calculate
the weighted sum. The following is the mathematical expression of it:
∑wi*xi = x1*w1 + x2*w2 + x3*w3+...........4*w4
Add a term called bias ‘b’ to this weighted sum to improve the model’s performance.
Step 2: An activation function is applied with the above-mentioned weighted sum giving us
an output either in binary form or a continuous value as follows:
Y=f(∑wi*xi + b)
The network is feed-forward in that none of the weights cycles back to an input unit or to an
output unit of a previous layer. It is fully connected in that each unit provides input to each
unit in the next forward layer. Multilayer feed-forward neural networks are able to model the
class prediction as a nonlinear combination of the inputs.
Backpropagation
Backpropagation learns by iteratively processing a data set of training tuples, comparing the
network’s prediction for each tuple with the actual known target value. The target value may
be the known class label of the training tuple (for classification problems) or a continuous
value (for prediction). For each training tuple, the weights are modified so as to minimize the
mean squared error between the network’s prediction and the actual target value. These
modifications are made in the “backwards” direction, that is, from the output layer, through
each hidden layer down to the first hidden layer.
Introduction to Deep Learning
Deep learning is a sub-field of machine learning dealing with algorithms inspired by the
structure and function of the brain called artificial neural networks. In other words, It mirrors
the functioning of our brains. Deep learning algorithms are similar to how nervous system
structured where each neuron connected each other and passing information.
A deep neural network is simply a shallow neural network with more than one hidden layer.
Each neuron in the hidden layer is connected to many others. Each arrow has a weight
property attached to it, which controls how much that neuron's activation affects the others
attached to it.
The word 'deep' in deep learning is attributed to these deep hidden layers and derives its
effectiveness from it. Selecting the number of hidden layers depends on the nature of the
problem and the size of the data set. The following figure shows a deep neural network with
two hidden layers.
CNN
CNN stands for Convolutional Neural Networks. CNN is very useful as it minimises human
effort by automatically detecting the features. CNNs are a class of Deep Neural Networks that
can recognize and classify particular features from images and are widely used for analyzing
visual images. Their applications range from image and video recognition, image
classification, medical image analysis, computer vision and natural language processing.
The term ‘Convolution” in CNN denotes the mathematical function of convolution which is a
special kind of linear operation wherein two functions are multiplied to produce a third
function which expresses how the shape of one function is modified by the other.
There are two main parts to a CNN architecture
A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction.
The network of feature extraction consists of many pairs of convolutional or pooling
layers.
A fully connected layer that utilizes the output from the convolution process and
predicts the class of the image based on the features extracted in previous stages.
This CNN model of feature extraction aims to reduce the number of features present
in a dataset. It creates new features which summarises the existing features contained
in an original set of features. There are many CNN layers as shown in the CNN
architecture diagram.
Convolution Layers
There are three types of layers that make up the CNN which are the convolutional layers,
pooling layers, and fully-connected (FC) layers. When these layers are stacked, a CNN
architecture will be formed.
1. Convolutional Layer
This layer is the first layer that is used to extract the various features from the input images.
In this layer, the mathematical operation of convolution is performed between the input
image and a filter of a particular size MxM. By sliding the filter over the input image, the dot
product is taken between the filter and the parts of the input image with respect to the size of
the filter (MxM).
The output is termed as the Feature map which gives us information about the image such as
the corners and edges. Later, this feature map is fed to other layers to learn several other
features of the input image.
The convolution layer in CNN passes the result to the next layer once applying the
convolution operation in the input. Convolutional layers in CNN benefit a lot as they ensure
the spatial relationship between the pixels is intact.
2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of this
layer is to decrease the size of the convolved feature map to reduce the computational costs.
This is performed by decreasing the connections between layers and independently operates
on each feature map. Depending upon method used, there are several types of Pooling
operations. It basically summarises the features generated by a convolution layer.
In Max Pooling, the largest element is taken from feature map. Average Pooling calculates
the average of the elements in a predefined sized Image section. The total sum of the
elements in the predefined section is computed in Sum Pooling. The Pooling Layer usually
serves as a bridge between the Convolutional Layer and the FC Layer.
This CNN model generalises the features extracted by the convolution layer, and helps the
networks to recognise the features independently. With the help of this, the computations are
also reduced in a network.
3. Fully Connected Layer
The Fully Connected (FC) layer consists of the weights and biases along with the neurons
and is used to connect the neurons between two different layers. These layers are usually
placed before the output layer and form the last few layers of a CNN Architecture.
In this, the input image from the previous layers are flattened and fed to the FC layer. The
flattened vector then undergoes few more FC layers where the mathematical functions
operations usually take place. In this stage, the classification process begins to take place. The
reason two layers are connected is that two fully connected layers will perform better than a
single connected layer.
4. Activation Functions
Finally, one of the most important parameters of the CNN model is the activation function.
They are used to learn and approximate any kind of continuous and complex relationship
between variables of the network. In simple words, it decides which information of the model
should fire in the forward direction and which ones should not at the end of the network.
It adds non-linearity to the network. There are several commonly used activation functions
such as the ReLU, Softmax, tanH and the Sigmoid functions. Each of these functions have a
specific usage. For a binary classification CNN model, sigmoid and softmax functions are
preferred an for a multi-class classification, generally softmax us used.
RNN
RNN stands for Recurrent Neural Network. RNNs are a very important variant of neural
networks heavily used in Natural Language Processing. They are a class of neural networks
that allow previous outputs to be used as inputs while having hidden states.
RNN has a concept of “memory” which remembers all information about what has been
calculated till time step t. RNNs are called recurrent because they perform the same task for
every element of a sequence, with the output being depended on the previous computations.
Architecture of Recurrent Neural Network:-