Module 5

BCA – 6th Sem DA
21BCA6D12 - Machine Learning

Module 5
Introduction to Neural Networks and Deep Learning:

Neural network representations, Perceptron Multilayer Networks and Back Propagation
Algorithm, Introduction to Deep Learning, Architectures: RNN, CNN.
Neural Network Representation

Before training can begin, the user must decide on the network topology by specifying the
number of units in the input layer, the number of hidden layers (if more than one), the
number of units in each hidden layer, and the number of units in the output layer.
Normalizing the input values for each attribute measured in the training tuples will help speed
up the learning phase. Typically, input values are normalized so as to fall between 0:0 and
1:0. Discrete-valued attributes may be encoded such that there is one input unit per domain
value. For example, if an attribute A has three possible or known values, namely {a0, a1, a2},
then we may assign three input units to represent A. That is, we may have, say, I0, I1, I2 as
input units. Each unit is initialized to 0. If A=a0, then I0 is set to 1. If A = a1, I1 is set to 1,
and so on. Neural networks can be used for both classification (to predict the class label of a
given tuple) or prediction (to predict a continuous-valued output).
For classification, one output unit may be used to represent two classes (where the value 1
represents one class, and the value 0 represents the other). If there are more than two classes,
then one output unit per class is used.
There are no clear rules as to the “best” number of hidden layer units. Network design is a
trial- and-error process and may affect the accuracy of the resulting trained network. The
initial values of the weights may also affect the resulting accuracy. A number of automated
techniques have been proposed that search for a “good” network structure.
Perceptron
A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm
enables neurons to learn and processes elements in the training set one at a time.
Types of Perceptron
Single layer: Single layer perceptron can learn only linearly separable patterns.
Multilayer: Multilayer perceptrons can learn about two or more layers having a greater
processing power.
A perceptron model is also classified as one of the best and most specific types of Artificial
Neural networks. Being a supervised learning algorithm of binary classifiers, we can also
consider it a single-layer neural network with four main parameters: input values, weights
and Bias, net sum, and an activation function.
How Perceptron works?

The perceptron model begins with multiplying all input values and their weights, then adds
these values to create the weighted sum. Further, this weighted sum is applied to the
activation function ‘f’ to obtain the desired output. This activation function is also known as
the step function and is represented by ‘f.’
This step function or Activation function is vital in ensuring that output is mapped between
(0,1) or (-1,1). Take note that the weight of input indicates a node’s strength. Similarly, an
input value gives the ability the shift the activation function curve up or down.
Step 1: Multiply all input values with corresponding weight values and then add to calculate
the weighted sum. The following is the mathematical expression of it:
∑wi*xi = x1*w1 + x2*w2 + x3*w3+...........4*w4
Add a term called bias ‘b’ to this weighted sum to improve the model’s performance.
Step 2: An activation function is applied with the above-mentioned weighted sum giving us
an output either in binary form or a continuous value as follows:
Y=f(∑wi*xi + b)
Multilayer Feed-Forward Neural Network

A multilayer feed-forward neural network consists of an input layer, one or more hidden
layers, and an output layer. Each layer is made up of units. The inputs to the network
correspond to the attributes measured for each training tuple. The inputs are fed
simultaneously into the units making up the input layer. These inputs pass through the input
layer and are then weighted and fed simultaneously to a second layer of “neuronlike” units,
known as a hidden layer. The outputs of the hidden layer units can be input to another hidden
layer, and so on. The number
of hidden layers is arbitrary, although in practice, usually only one is used. The weighted
outputs of the last hidden layer are input to units making up the output layer, which emits the
network’s prediction for given tuples. The units in the input layer are called input units. The
units in the hidden layers and output layer are sometimes referred to as neurodes, due to their
symbolic biological basis, or as output units.
The network is feed-forward in that none of the weights cycles back to an input unit or to an
output unit of a previous layer. It is fully connected in that each unit provides input to each
unit in the next forward layer. Multilayer feed-forward neural networks are able to model the
class prediction as a nonlinear combination of the inputs.
Backpropagation
Backpropagation learns by iteratively processing a data set of training tuples, comparing the
network’s prediction for each tuple with the actual known target value. The target value may
be the known class label of the training tuple (for classification problems) or a continuous
value (for prediction). For each training tuple, the weights are modified so as to minimize the
mean squared error between the network’s prediction and the actual target value. These
modifications are made in the “backwards” direction, that is, from the output layer, through
each hidden layer down to the first hidden layer.
Introduction to Deep Learning
Deep learning is a sub-field of machine learning dealing with algorithms inspired by the
structure and function of the brain called artificial neural networks. In other words, It mirrors
the functioning of our brains. Deep learning algorithms are similar to how nervous system
structured where each neuron connected each other and passing information.
A deep neural network is simply a shallow neural network with more than one hidden layer.
Each neuron in the hidden layer is connected to many others. Each arrow has a weight
property attached to it, which controls how much that neuron's activation affects the others
attached to it.
The word 'deep' in deep learning is attributed to these deep hidden layers and derives its
effectiveness from it. Selecting the number of hidden layers depends on the nature of the
problem and the size of the data set. The following figure shows a deep neural network with
two hidden layers.
CNN
CNN stands for Convolutional Neural Networks. CNN is very useful as it minimises human
effort by automatically detecting the features. CNNs are a class of Deep Neural Networks that
can recognize and classify particular features from images and are widely used for analyzing
visual images. Their applications range from image and video recognition, image
classification, medical image analysis, computer vision and natural language processing.
The term ‘Convolution” in CNN denotes the mathematical function of convolution which is a
special kind of linear operation wherein two functions are multiplied to produce a third
function which expresses how the shape of one function is modified by the other.
There are two main parts to a CNN architecture
 A convolution tool that separates and identifies the various features of the image for
analysis in a process called as Feature Extraction.
 The network of feature extraction consists of many pairs of convolutional or pooling
layers.
 A fully connected layer that utilizes the output from the convolution process and
predicts the class of the image based on the features extracted in previous stages.
 This CNN model of feature extraction aims to reduce the number of features present
in a dataset. It creates new features which summarises the existing features contained
in an original set of features. There are many CNN layers as shown in the CNN
architecture diagram.
Convolution Layers
There are three types of layers that make up the CNN which are the convolutional layers,
pooling layers, and fully-connected (FC) layers. When these layers are stacked, a CNN
architecture will be formed.
1. Convolutional Layer
This layer is the first layer that is used to extract the various features from the input images.
In this layer, the mathematical operation of convolution is performed between the input
image and a filter of a particular size MxM. By sliding the filter over the input image, the dot
product is taken between the filter and the parts of the input image with respect to the size of
the filter (MxM).
The output is termed as the Feature map which gives us information about the image such as
the corners and edges. Later, this feature map is fed to other layers to learn several other
features of the input image.
The convolution layer in CNN passes the result to the next layer once applying the
convolution operation in the input. Convolutional layers in CNN benefit a lot as they ensure
the spatial relationship between the pixels is intact.
2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling Layer. The primary aim of this
layer is to decrease the size of the convolved feature map to reduce the computational costs.
This is performed by decreasing the connections between layers and independently operates
on each feature map. Depending upon method used, there are several types of Pooling
operations. It basically summarises the features generated by a convolution layer.
In Max Pooling, the largest element is taken from feature map. Average Pooling calculates
the average of the elements in a predefined sized Image section. The total sum of the
elements in the predefined section is computed in Sum Pooling. The Pooling Layer usually
serves as a bridge between the Convolutional Layer and the FC Layer.
This CNN model generalises the features extracted by the convolution layer, and helps the
networks to recognise the features independently. With the help of this, the computations are
also reduced in a network.
3. Fully Connected Layer
The Fully Connected (FC) layer consists of the weights and biases along with the neurons
and is used to connect the neurons between two different layers. These layers are usually
placed before the output layer and form the last few layers of a CNN Architecture.
In this, the input image from the previous layers are flattened and fed to the FC layer. The
flattened vector then undergoes few more FC layers where the mathematical functions
operations usually take place. In this stage, the classification process begins to take place. The
reason two layers are connected is that two fully connected layers will perform better than a
single connected layer.
4. Activation Functions
Finally, one of the most important parameters of the CNN model is the activation function.
They are used to learn and approximate any kind of continuous and complex relationship
between variables of the network. In simple words, it decides which information of the model
should fire in the forward direction and which ones should not at the end of the network.
It adds non-linearity to the network. There are several commonly used activation functions
such as the ReLU, Softmax, tanH and the Sigmoid functions. Each of these functions have a
specific usage. For a binary classification CNN model, sigmoid and softmax functions are
preferred an for a multi-class classification, generally softmax us used.
RNN
RNN stands for Recurrent Neural Network. RNNs are a very important variant of neural
networks heavily used in Natural Language Processing. They are a class of neural networks
that allow previous outputs to be used as inputs while having hidden states.
RNN has a concept of “memory” which remembers all information about what has been
calculated till time step t. RNNs are called recurrent because they perform the same task for
every element of a sequence, with the output being depended on the previous computations.
Architecture of Recurrent Neural Network:-
The right diagram in below figure represents a simple Recurrent unit.

So,
A recurrent neural network can be thought of as multiple copies of a feedforward network

network, each passing a message to a successor.

Module 5

Uploaded by

Module 5

Uploaded by

BCA – 6th Sem DA

21BCA6D12 - Machine Learning

Introduction to Neural Networks and Deep Learning:

Neural Network Representation

How Perceptron works?

Multilayer Feed-Forward Neural Network

The right diagram in below figure represents a simple Recurrent unit.

A recurrent neural network can be thought of as multiple copies of a feedforward network

You might also like