0% found this document useful (0 votes)
61 views102 pages

DL Mod3

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, components, and functionality in image classification. It explains key concepts such as convolution, pooling, padding, stride, and activation functions, along with various CNN architectures like LeNet-5, AlexNet, ZFNet, and VGG. The advantages and disadvantages of CNNs are also discussed, highlighting their effectiveness in feature detection and challenges like computational expense and overfitting.

Uploaded by

v3319763
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views102 pages

DL Mod3

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, components, and functionality in image classification. It explains key concepts such as convolution, pooling, padding, stride, and activation functions, along with various CNN architectures like LeNet-5, AlexNet, ZFNet, and VGG. The advantages and disadvantages of CNNs are also discussed, highlighting their effectiveness in feature detection and challenges like computational expense and overfitting.

Uploaded by

v3319763
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CONVOLUTIONAL

NEURAL NETWORK
MODULE 3
INTRODUCTION
• A Convolutional Neural Network (CNN) is a type of Deep
Learning neural network architecture commonly used in
Computer Vision.
• Computer vision is a field of Artificial Intelligence that
enables a computer to understand and interpret the
image or visual data.
• We use CNN for image classification.
• CNN’s are a special type of ANN which accepts images
as inputs.
• Why CNN?
• Talking about grayscale images, they have pixel ranges from 0 to 255
i.e. 8-bit pixel values. If the size of the image is NxM, then the size of
the input vector will be N*M. For RGB images, it would be N*M*3.
Consider an RGB image with size 30x30. This would require 2700
neurons. An RGB image of size 256x256 would require over 100000
neurons.
• The number of weights, parameters for 224x224x3 is very high.
• A single neuron in the output layer will have 224x224x3 weights coming into
it.
• This would require more computation, memory, and data.
• Each layer performs convolution on CNN.
• CNN takes input as an image volume for the RGB image.
• Basically, an image is taken as an input and we apply kernel/filter on the image
to get the output.
• CNN also enables parameter sharing between the output neurons which
means that a feature detector (for example horizontal edge detector) that’s
useful in one part of the image is probably useful in another part of the image.
• Convolutions
• Every output neuron is connected to a small neighborhood in the
input through a weight matrix also referred to as a kernel or a weight
matrix. We can define multiple kernels for every convolution layer
each giving rise to an output. Each filter is moved around the input
image. The outputs corresponding to each filter are stacked giving rise
to an output volume.
• Padding
• Padded convolution is used when preserving the dimension of an
input matrix that is important to us and it helps us keep more of the
information at the border of an image.
• We have seen that convolution reduces the size of the feature map.
• To retain the dimension of feature map as that of an input map, we
pad or append the rows and column with zeros.
• Padding P=(F-1)/2
• F is the size of the kernel matrix
• Stride
• Stride refers to the number of pixels the kernel filter will skip i.e
pixels/time.
• A Stride of 2 means the kernel will skip 2 pixels before performing the
convolution operation.
• In the figure above, the kernel filter is sliding over the input matrix by
skipping one pixel at a time. A Stride of 2 would perform this skipping
action twice before performing the convolution like in the image
below.
• The output feature map is reduced(4 times) when the stride is
increased from 1 to 2.
• The dimension of the output feature map is (N-F)/S + 1.
• Pooling
• Pooling provides translational invariance by subsampling: reduces the
size of the feature maps. The two commonly used Pooling techniques
are max pooling and average pooling.
• The pooling operation divides 4x4 matrix into 4 2x2 matrices and picks the value
which is the greatest amongst the four(for max-pooling) and the average of the
four( for average pooling).
• This reduces the size of the feature maps which therefore reduces
the number of parameters without missing important information.
• One thing to note here is that the pooling operation reduces the
Nx and Ny values of the input feature map but does not reduce
the value of Nc (number of channels).
• Also, the hyperparameters involved in pooling operation are the
filter dimension, stride, and type of pooling(max or avg).
• There is no parameter for gradient descent to learn.
Output Feature Map

• The size of the output feature map or volume depends on:

• Size of the input feature map


• Kernel size(Kw,Kh)
• Zero padding
• Stride(Sw, Sh)
CNN architecture

• Convolutional Neural Network consists of multiple layers like the


input layer, Convolutional layer, Pooling layer, and fully connected
layers.
• CNN is very useful as it minimises human effort by
automatically detecting the features. For example, for
apples and mangoes, it would automatically detect the
distinct features of each class on its own.
BASIC ARCHITECTURE
• There are two main parts to a CNN architecture
• A convolution tool that separates and identifies the various features of the
image for analysis in a process called as Feature Extraction.
• The network of feature extraction consists of many pairs of convolutional or
pooling layers.
• A fully connected layer that utilizes the output from the convolution process
and predicts the class of the image based on the features extracted in
previous stages.
• This CNN model of feature extraction aims to reduce the number of features
present in a dataset. It creates new features which summarises the existing
features contained in an original set of features. There are many CNN layers.
• There are three types of layers that make up the CNN
which are the convolutional layers, pooling layers, and
fully-connected (FC) layers. When these layers are
stacked, a CNN architecture will be formed. In addition
to these three layers, there are two more important
parameters which are the dropout layer and the
activation function.
• 1. Convolutional Layer
• This layer is the first layer that is used to extract the various features
from the input images. In this layer, the mathematical operation of
convolution is performed between the input image and a filter of a
particular size MxM. By sliding the filter over the input image, the dot
product is taken between the filter and the parts of the input image
with respect to the size of the filter (MxM).
• The output is termed as the Feature map which gives us information
about the image such as the corners and edges. Later, this feature
map is fed to other layers to learn several other features of the input
image.
• The convolution layer in CNN passes the result to the next layer once
applying the convolution operation in the input.
• 2. Pooling Layer
• In most cases, a Convolutional Layer is followed by a Pooling Layer. The
primary aim of this layer is to decrease the size of the convolved feature
map to reduce the computational costs.
• Depending upon method used, there are several types of Pooling
operations.
• It basically summarizes the features generated by a convolution layer.
• In Max Pooling, the largest element is taken from feature map. Average
Pooling calculates the average of the elements in a predefined sized Image
section. The total sum of the elements in the predefined section is
computed in Sum Pooling. The Pooling Layer usually serves as a bridge
between the Convolutional Layer and the FC Layer.
• This CNN model generalises the features extracted by the convolution
layer, and helps the networks to recognise the features independently.
• With the help of this, the computations are also reduced in a network.
• 3. Fully Connected Layer
• The Fully Connected (FC) layer consists of the weights and biases along
with the neurons and is used to connect the neurons between two
different layers. These layers are usually placed before the output layer
and form the last few layers of a CNN Architecture.
• In this, the input image from the previous layers are flattened and fed to
the FC layer. The flattened vector then undergoes few more FC layers
where the mathematical functions operations usually take place. In this
stage, the classification process begins to take place. The reason two
layers are connected is that two fully connected layers will perform
better than a single connected layer. These layers in CNN reduce the
human supervision
• 4. Dropout
• Usually, when all the features are connected to the FC layer, it can cause
overfitting in the training dataset. Overfitting occurs when a particular
model works so well on the training data causing a negative impact in the
model’s performance when used on a new data.
• To overcome this problem, a dropout layer is utilised wherein a few neurons
are dropped from the neural network during training process resulting in
reduced size of the model. On passing a dropout of 0.3, 30% of the nodes
are dropped out randomly from the neural network.
• Dropout results in improving the performance of a machine learning model
as it prevents overfitting by making the network simpler. It drops neurons
from the neural networks during training.
• 5. Activation Functions
• Finally, one of the most important parameters of the CNN model is the
activation function.
• They are used to learn and approximate any kind of continuous and complex
relationship between variables of the network.
• In simple words, it decides which information of the model should fire in the
forward direction and which ones should not.
• It adds non-linearity to the network.
• There are several commonly used activation functions such as the ReLU,
Softmax, tanH and the Sigmoid functions.
• Each of these functions have a specific usage.
• For a binary classification CNN model, sigmoid and softmax functions are
preferred an for a multi-class classification, generally softmax us used.
• In simple terms, activation functions in a CNN model determine whether a
neuron should be activated or not.
• It decides whether the input to the work is important or not to predict using
mathematical operations.
LeNet-5 CNN Architecture
• In 1998, the LeNet-5 architecture was introduced in a research paper titled
“Gradient-Based Learning Applied to Document Recognition” by Yann LeCun,
Leon Bottou, Yoshua Bengio, and Patrick Haffner. It is one of the earliest and
most basic CNN architecture.
• It consists of 5 layers. The first layer consists of an input image with dimensions
of 32×32. It is convolved with 6 filters of size 5×5 resulting in dimension of
28x28x6. The second layer is a Pooling operation which filter size 2×2 and stride
of 2. Hence the resulting image dimension will be 14x14x6.
• Similarly, the third layer also involves in a convolution operation with 16 filters of
size 5×5 followed by a fourth pooling layer with similar filter size of 2×2 and
stride of 2. Thus, the resulting image dimension will be reduced to 5x5x16.
• Once the image dimension is reduced, the fifth layer is a fully connected
convolutional layer with 120 filters each of size 5×5. In this layer, each of the
120 units in this layer will be connected to the 400 (5x5x16) units from the
previous layers. The sixth layer is also a fully connected layer with 84 units.
• The final seventh layer will be a softmax output layer with ‘n’ possible classes
depending upon the number of classes in the dataset
LeNet-5 CNN Architecture
• Advantages of Convolutional Neural Networks (CNNs):
1.Good at detecting patterns and features in images, videos, and
audio signals.
2.Robust to translation, rotation, and scaling invariance.
3.End-to-end training, no need for manual feature extraction.
4.Can handle large amounts of data and achieve high accuracy.
• Disadvantages of Convolutional Neural Networks (CNNs):
1.Computationally expensive to train and require a lot of memory.
2.Can be prone to overfitting if not enough data or proper
regularization is used.
CASE STUDIES OF CONVOLUTIONAL
ARCHITECTURE
• ALEXNET
• AlexNet comprises 8 layers — out of which 5 are
convolutional and 3 denotes fully-connected. A couple
of more layers were stacked onto LeNet 5, hence
forming AlexNet as demonstrated in Figure. This
architecture was the first to carry out ReLU activation
function and utilized Dropout layers.The architecture
uses 60,000 parameters.
• Advantages:
1.The model performed classification of images efficiently.
2.The computation performed is fast.
3.More computation and memory efficiency
4.It is robust
• Disadvantages:
1.They are difficult in application of high resolution images
• The first two fully-connected layers have 4096 nodes
each. After the above mentioned last max-pooling, we
have a total of 6*6*256 i.e. 9216 nodes or features and
each of these nodes is connected to each of the nodes
in this fully-connected layer. So the number of
connections we'll have in this case is 9216*4096.
ZFNET
• ZFNet came to the limelight having significant
improvement over AlexNet.
• This paper is the golden gem that gives you the starting
point for many concepts such as deep feature
visualization, feature invariance, feature
evolution, and feature importance.
• Our input is 224x224x3 images.
• Next, 96 convolutions of 7x7 with a stride of 2 are performed, followed
by ReLU activation, 3x3 max pooling with stride 2 and local contrast
normalization.
• Followed by it are 256 filters of 3x3 each which are then again local contrast
normalized and pooled.
• The third and fourth layers are identical with 384 kernels of 3x3 each.
• The fifth layer has 256 filters of 3x3, followed by 3x3 max pooling with
stride 2 and local contrast normalization.
• The sixth and seventh layers house 4096 dense units each.
• Finally, we feed into a Dense layer of 1000 neurons i.e. the number of classes
in ImageNet.
• The local normalization tends to uniformize the mean and variance of
an image around a local neighborhood. This is especially useful for
correct uneven illumination or shading artifacts.
• Local Contrast Normalization is a type of normalization that performs
local subtraction and division normalizations, enforcing a sort of local
competition between adjacent features in a feature map, and
between features at the same spatial location in different feature
maps.
VGG
• VGG was developed to increase the depth of such CNNs
in order to increase the model performance.
• VGG stands for Visual Geometry Group; it is a standard
deep Convolutional Neural Network (CNN) architecture
with multiple layers. The “deep” refers to the number of
layers with VGG-16 or VGG-19 consisting of 16 and 19
convolutional layers.
VGG-16
• The model VGG16 was developed by Visual Geometry
Group (VGG) that comprises of 13 convolutional and 3
fully connected layers, carrying ReLu custom from
AlexNet. More and more layers are stacked on AlexNet
to get the VGG model.
VGG 19
• The concept of the VGG19 model (also VGGNet-19) is
the same as the VGG16 except that it supports 19
layers.
• The “16” and “19” stand for the number of weight
layers in the model (convolutional layers).
• This means that VGG19 has three more convolutional
layers than VGG16.
• VGG 19
• It is a variety of VGG model that includes 19 layers (16 convolution layer, 5
MaxPool, 3 Fully connected layer and 1 SoftMax layer).
• Advantages:
1.The model is efficient in performing transfer learning and small classification
tasks.
2.It is more robust
• Disadvantages:
1.Due to the use of large no of parameters i.e 138 million parameters, it will
increase the computation cost
2.The model isn’t useful for deep networks as more the deeper it goes, it is more
inclined to Vanishing Gradients Problem.
• Input: The VGGNet takes in an image input size of 224×224.
• Convolutional Layers: VGG’s convolutional layers leverage a
minimal receptive field, i.e., 3×3, the smallest possible size
that still captures up/down and left/right.
• Hidden Layers: All the hidden layers in the VGG network use
ReLU. VGG does not usually leverage Local Response
Normalization (LRN) as it increases memory consumption
and training time. Moreover, it makes no improvements to
overall accuracy.
• Fully-Connected Layers: The VGGNet has three fully
connected layers. Out of the three layers, the first two
have 4096 channels each, and the third has 1000
channels, 1 for each class.
ResNet-50
• ResNet stands for Residual Network and is a specific type of
convolutional neural network (CNN) introduced in the 2015.

• ResNet-50 is a 50-layer convolutional neural network (48


convolutional layers, one MaxPool layer, and one average pool layer).
• Residual neural networks are a type of artificial neural network (ANN)
that forms networks by stacking residual blocks.
• The ResNet architecture was developed in response to a
surprising observation in deep learning research: adding more
layers to a neural network was not always improving the results.
• This was unexpected because adding a layer to a network
should allow it to learn at least what the previous network
learned, plus additional information.
• To address this issue, the ResNet team, led by Kaiming He,
developed a novel architecture that incorporated skip
connections.
• These connections allowed the preservation of information from
earlier layers, which helped the network learn better
representations of the input data. With the ResNet architecture,
they were able to train networks with as many as 152 layers.
• ResNet-50 Architecture
• ResNet-50 consists of 50 layers that are divided into 5
blocks, each containing a set of residual blocks. The
residual blocks allow for the preservation of information
from earlier layers, which helps the network to learn
better representations of the input data.
• 1. Convolutional Layers
• The first layer of the network is a convolutional layer that performs
convolution on the input image. This is followed by a max-pooling layer that
downsamples the output of the convolutional layer. The output of the max-
pooling layer is then passed through a series of residual blocks.

2. Residual Blocks
• Each residual block consists of two convolutional layers, each followed by a
batch normalization layer and a rectified linear unit (ReLU) activation function.
The output of the second convolutional layer is then added to the input of the
residual block, which is then passed through another ReLU activation function.
The output of the residual block is then passed on to the next block.
• 3. Fully Connected Layer
• The final layer of the network is a fully connected layer that takes the output of
the last residual block and maps it to the output classes. The number of
neurons in the fully connected layer is equal to the number of output classes.

Concept of Skip Connection
• Skip connections, also known as identity connections, are a key feature of
ResNet-50.
• They allow for the preservation of information from earlier layers, which helps
the network to learn better representations of the input data.
• Skip connections are implemented by adding the output of an earlier layer to
the output of a later layer.
• Advantages:
1.The training process is fast
2.Minimized vanishing gradient problem
3.It can train deeper networks
• ResNet-50 has several advantages over other networks. One of the main
advantages is its ability to train very deep networks with hundreds of
layers.
• This is made possible by the use of residual blocks and skip connections,
which allow for the preservation of information from earlier layers.
• Another advantage of ResNet-50 is its ability to achieve state-of-the-art
results in a wide range of image-related tasks such as object detection,
image classification, and image segmentation.

• Disadvantages:
1.The training takes a large amount of time which makes it infeasible and
impractical for real world applications.
CONVOLUTION OPERATION
• Consider an input image, we calculate the value of each and
every pixel by considering the weighted sum of pixels around it.
• The matrix of weights is referred to as the Kernel or Filter.
Here we are calculating the value of circled pixel considering 3
neighbors around it, assume that the weights w1, w2, w3, w4
are associated with these 4 pixels respectively
• In the above case, we have the kernel of size 2X2.

Now we place the 2X2 filter over the first 2X2 portion of the image and take
the weighted sum and that would give the new value of the first pixel.
• The output of this operation would be: (aw + bx + ey
+ fz)
• Then we move the filter horizontally by one and place it
over the next 2 X 2 portion of the input; in this case
pixels of interest would be b, c, f, g and we compute
the output using the same technique and we would get:
And then again we move the kernel/filter by 1 in
the horizontal direction and take the weighted
sum.
• So, after this, the output from the first layer would look
like:
• Then we move the kernel by 1 down in the vertical direction,
calculate the output, move the kernel in the horizontal
direction and in general we move the kernel like this: first,
we start off with the starting portion of the image, move the
filter in the horizontal direction and cover this row
completely then we move the filter in the vertical
direction(by some amount respective to top left portion of
image), again stride it horizontally through the entire row
and continue like this. In essence, we move the kernel left
to right top to bottom.
• EXPLAIN
• PADDING
• STRIDE
• Pooling
Motivation for using convolution
networks
Convolution leverages three important ideas to improve ML systems:
1. Sparse interactions
2. Parameter sharing
3. Equivariant representations
Convolution also allows for working with inputs of variable size
SPARSE INTERACTION
• Trivial neural network layers use matrix multiplication by a matrix of
parameters describing the interaction between the input and output unit.
• This means that every output unit interacts with every input unit.
• However, convolution neural networks have sparse interaction.
• This is achieved by making kernel smaller than the input e.g., an image
can have millions or thousands of pixels, but while processing it using
kernel we can detect meaningful information that is of tens or hundreds of
pixels.
• This means that we need to store fewer parameters that not only reduces
the memory requirement of the model but also improves the statistical
efficiency of the model.
SHARED PARAMETERS
• If computing one feature at a spatial point (x1, y1) is useful then it
should also be useful at some other spatial point say (x2, y2).
• It means that for a single two-dimensional slice i.e., for creating
one activation map, neurons are constrained to use the same set
of weights.
• In a traditional neural network, each element of the weight matrix
is used once and then never revisited, while convolution network
has shared parameters i.e., for getting output, weights applied to
one input are the same as the weight applied elsewhere.
Equivariant representations

• Due to parameter sharing, the layers of convolution


neural network will have a property of equivariance to
translation.
• It says that if we changed the input in a way, the
output will also get changed in the same way.
• Q. What happens is the stride of convolution layer increases? What can be the
maximum stride? Justify
• Stride is a component of convolutional neural networks, or neural networks tuned
for the compression of images and video data. Stride is a parameter of the neural
network's filter that modifies the amount of movement over the image or video.
For example, if a neural network's stride is set to 1, the filter will move one pixel,
or unit, at a time. The size of the filter affects the encoded output volume, so
stride is often set to a whole integer, rather than a fraction or decimal.
• Naturally, as the stride, or movement, is increased, the resulting output will be
smaller.
• The choice of stride is also important, but it affects the tensor shape after the convolution, hence the
whole network. The general rule is to use stride=1 in usual convolutions and preserve the spatial size
with padding, and use stride=2 when you want to downsample the image.
• Tensor
• Tensors represent deep learning data. They are multidimensional
arrays, used to store multiple dimensions of a dataset. Each
dimension is called a feature. For example, a cube storing data across
an X, Y, and Z access is represented as a 3-dimensional tensor.
• Tensors are multi-dimensional arrays with a uniform
type used to represent different features of the data
Tensor
Kernel Flipping
• In a CNN, each filter is learned to detect a specific feature in the input
image. By flipping the filter, we ensure that the filter learns to detect
the same feature regardless of its position in the input image.
• Downsampling
• A downsampling layer helps to reduce the dimensionality of the
features . This helps save computations. Average pooling, max
pooling, global average pooling are some examples of downsampling
layer. An alternative would be to use strides in the convolution layer
to downsample.
POOLING
• The pooling operation involves sliding a two-dimensional
filter over each channel of feature map and summarising
the features lying within the region covered by the filter.
• A common CNN model architecture is to have a number of convolution
and pooling layers stacked one after the other.
• Pooling layers are used to reduce the dimensions of the feature maps.
Thus, it reduces the number of parameters to learn and the amount of
computation performed in the network.
• The pooling layer summarises the features present in a region of the
feature map generated by a convolution layer.
• Types of Pooling Layers:

• . Max Pooling
• Max pooling is a pooling operation that selects the maximum element
from the region of the feature map covered by the filter. Thus, the
output after max-pooling layer would be a feature map containing the
most prominent features of the previous feature map.
• Average Pooling
• Average pooling computes the average of the elements present in the
region of feature map covered by the filter. Thus, while max pooling
gives the most prominent feature in a particular patch of the feature
map, average pooling gives the average of features present in a patch.
• Global Pooling
• Global pooling reduces each channel in the feature map to a single
value. Thus, an nh x nw x nc feature map is reduced to 1 x 1 x nc
feature map. This is equivalent to using a filter of dimensions nh x nw
i.e. the dimensions of the feature map.
• Further, it can be either global max pooling or global average pooling.
• In convolutional neural networks (CNNs), the pooling layer is a common type of layer that is
typically added after convolutional layers. The pooling layer is used to reduce the spatial
dimensions (i.e., the width and height) of the feature maps, while preserving the depth (i.e.,
the number of channels).

• The pooling layer works by dividing the input feature map into a set of non-overlapping
regions, called pooling regions. Each pooling region is then transformed into a single output
value, which represents the presence of a particular feature in that region. The most common
types of pooling operations are max pooling and average pooling.
• In max pooling, the output value for each pooling region is simply the maximum value of the
input values within that region. This has the effect of preserving the most salient features in
each pooling region, while discarding less relevant information. Max pooling is often used in
CNNs for object recognition tasks, as it helps to identify the most distinctive features of an
object, such as its edges and corners.
• In average pooling, the output value for each pooling region is the average of the input values
within that region. This has the effect of preserving more information than max pooling, but
may also dilute the most salient features. Average pooling is often used in CNNs for tasks such
as image segmentation and object detection, where a more fine-grained representation of the
input is required.
• Advantages of Pooling Layer:

• Dimensionality reduction: The main advantage of pooling layers is that they help in
reducing the spatial dimensions of the feature maps. This reduces the computational
cost and also helps in avoiding overfitting by reducing the number of parameters in the
model.
• Translation invariance: Pooling layers are also useful in achieving translation invariance
in the feature maps. This means that the position of an object in the image does not
affect the classification result, as the same features are detected regardless of the
position of the object.
• Feature selection: Pooling layers can also help in selecting the most important features
from the input, as max pooling selects the most salient features and average pooling
preserves more information.
STRUCTURED OUTPUT
• Convolutional networks can be trained to output high-dimensional
structured output rather than just a classification score.
• To produce an output map as same size as input map, only same-
padded convolutions can be stacked.
• The output of the first labelling stage can be refined successively by
another convolutional model.
• If the models use tied parameters, this gives rise to a type of recursive
model
DATATYPES
• The data used with a convolutional network usually consist of several
channels,each channel being the observation of a different quantity at
some point in space or time.

• When output is variable sized, no extra design change needs to be


made.
• When output requires fixed size (classification), a pooling stage with
kernel size proportional to input size needs to be used.
• For CNNs, the data used usually has several channels (single channel
or multichannel) with different dimensionalities (1-D, 2-D, or 3-D).
Each of these channels represent an observation of a different
quantity at some point in space or time.
2D convolution is a fundamental operation in image processing and computer vision. It
involves convolving a 2D input image matrix with a 2D kernel (also known as a filter or mask) to produce a 2D
output image. The convolution operation is used for various tasks such as edge detection, blurring, sharpening,
and feature extraction in images.
Applications of Convolutional Neural
Networks
Convolutional Neural Networks (CNNs) excel in image and
video analysis tasks due to their hierarchical feature
extraction. They find applications in image recognition,
object detection, facial recognition, medical image
analysis, self-driving cars, and more. CNNs leverage their
convolutional and pooling layers to automatically learn
relevant features, making them pivotal in visual data
processing tasks.
Image Classification – Search Engines, Social Media, Recommender Systems
The major use of convolutional neural networks is image recognition and classification.
The CNN picture categorization serves the following purposes:

Deconstruct an image and find its distinguishing feature. The system employs a supervised machine learning classification
algorithm for this purpose.
Reduces the description of its important credentials. It’s done with the help of an unsupervised machine learning algorithm.
This method is used in the following fields:
Image tagging

The most basic type of image classification algorithm is image tagging. The image tag is a term or a phrase that describes the
images and makes them easier to find. This method is used by big companies like Facebook, Google, and Amazon. It is also one of
the fundamental elements of visual search. Tagging involves recognition of objects and even sentiment analysis of the image
tone.

Visual Search

This method involves comparing an input image to the access database. Furthermore, the visual search evaluates the image and
searches for other photos that have comparable credentials.
Medical Image Computing – Predictive Analytics, Healthcare Data Science

The most fascinating image recognition CNN use case is medical image computing.

The medical image includes a whole lot of further data analysis that arises from initial
image recognition.

CNN medical image classification detects anomalies in X-ray and MRI images with better
accuracy than the human eye.

These systems can display the series of photos as well as the differences between them.
This feature lays the groundwork for future predictive analytics.

Medical image classification is based on massive datasets such as Public Health Records.

It serves as a training basis for the algorithms and patients’ confidential data and test
results.

They work together to create an analytical platform that monitors the current status of the
patient and forecasts results.
Health Risk Assessment Using Predictive Analytics

In healthcare, saving lives is a top priority. And it is always advantageous to have


the ability to predict the future. Because when it comes to patient care, you must
be prepared for anything. The health risk assessment is an excellent
demonstration.

Convolutional Neural Network Predictive Analytics is used in this field.

Here’s how CNN Health Risk Assessment works:

CNN uses a grid topology approach to process data, which is a set of spatial
correlations between data points.
The grid is two-dimensional in the case of images.
The grid is one-dimensional in the case of time series textual data.
The convolution algorithm is then used to identify some aspects of the input.
Take into account the variations of input.
Determine variable interactions.
Drug Discovery Using Predictive Analytics

The problem is that drug discovery and development is a time-consuming and costly process. In drug
discovery, scalability and cost-effectiveness are critical.

The process of developing new drugs lends itself well to the implementation of neural networks.
During the development of a new drug, there is a large amount of data to consider.

The following stages are involved in the drug discovery process:

--This is a clustering and classification problem involving the analysis of observed medical effects.
--Machine learning anomaly detection may be useful in hit discovery.
--The algorithm searches the compound database for new activities that can be used for specific
purposes.
--Then, using the Hit to Lead process, the results are narrowed down to the most relevant.
-- That’s what dimensionality reduction and regression are all about.
--Then there’s Lead Optimization, which is the process of combining and testing lead compounds to
find the best approaches to them.
--The stages involve the examination of the organism’s chemical and physical effects.

You might also like