0% found this document useful (0 votes)
4 views5 pages

VGG16

Uploaded by

Saher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

VGG16

Uploaded by

Saher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

What is VGG16

A convolutional neural network is also known as a ConvNet, which is a kind of artificial neural
network. A convolutional neural network has an input layer, an output layer, and various hidden
layers. VGG16 is a type of CNN (Convolutional Neural Network) that is considered to be one of
the best computer vision models to date. The creators of this model evaluated the networks
and increased the depth using an architecture with very small (3 × 3) convolution filters, which
showed a significant improvement on the prior-art configurations. They pushed the depth to
16–19 weight layers making it approx — 138 trainable parameters.

What is VGG16 used for

VGG16 is object detection and classification algorithm which is able to classify 1000 images of
1000 different categories with 92.7% accuracy. It is one of the popular algorithms for image
classification and is easy to use with transfer learning.

VGG16 Architecture
 The 16 in VGG16 refers to 16 layers that have weights. In VGG16 there are thirteen
convolutional layers, five Max Pooling layers, and three Dense layers which sum up to 21
layers but it has only sixteen weight layers i.e., learnable parameters layer.

 VGG16 takes input tensor size as 224, 244 with 3 RGB channel

 Most unique thing about VGG16 is that instead of having a large number of hyper-
parameters they focused on having convolution layers of 3x3 filter with stride 1 and
always used the same padding and maxpool layer of 2x2 filter of stride 2.

 The convolution and max pool layers are consistently arranged throughout the whole
architecture

 Conv-1 Layer has 64 number of filters, Conv-2 has 128 filters, Conv-3 has 256 filters, Conv
4 and Conv 5 has 512 filters.

 Three Fully-Connected (FC) layers follow a stack of convolutional layers: the first two

have 4096 channels each, the third performs 1000-way ILSVRC classification and thus

contains 1000 channels (one for each class). The final layer is the soft-max layer.
What are the Challenges of VGG 16:

 It is very slow to train (the original VGG model was trained on the Nvidia Titan GPU for
2–3 weeks).

 The size of VGG-16 trained imageNet weights is 528 MB. So, it takes quite a lot of disk
space and bandwidth that makes it inefficient
VGG Blocks
The basic building block of CNNs is a sequence of the following: (i) a convolutional layer with
padding to maintain the resolution, (ii) a nonlinearity such as a ReLU, (iii) a pooling layer such as
max-pooling to reduce the resolution. One of the problems with this approach is that the spatial
resolution decreases quite rapidly. In particular, this imposes a hard limit of log2d convolutional
layers on the network before all dimensions (d) are used up. For instance, in the case of
ImageNet, it would be impossible to have more than 8 convolutional layers in this way.

VGG Network
VGGNet consists of 16 convolutional layers and is very appealing because of its very uniform
architecture. Similar to AlexNet, only 3x3 convolutions, but lots of filters. Trained on 4 GPUs for
2–3 weeks. It is currently the most preferred choice in the community for extracting features
from images. The weight configuration of the VGGNet is publicly available and has been used in
many other applications and challenges as a baseline feature extractor. However, VGGNet
consists of 138 million parameters, which can be a bit challenging to handle.

Like AlexNet and LeNet, the VGG Network can be partitioned into two parts: the first consisting
mostly of convolutional and pooling layers and the second consisting of fully connected layers
that are identical to those in AlexNet. The key difference is that the convolutional layers are
grouped in nonlinear transformations that leave the dimensonality unchanged.
From AlexNet to VGG. The key difference is that VGG consists of blocks of layers, whereas
AlexNet’s layers are all designed individually.

The convolutional part of the network connects several VGG blocks from above Fig. (also
defined in the vgg block function) in succession. This grouping of convolutions is a pattern that
has remained almost unchanged over the past decade, although the specific choice of
operations has undergone considerable modifications. The variable arch consists of a list of
tuples (one per block), where each contains two values: the number of convolutional layers and
the number of output channels, which are precisely the arguments required to call
the vgg_block function.

The original VGG network had five convolutional blocks, among which the first two have one
convolutional layer each and the latter three contain two convolutional layers each. The first
block has 64 output channels and each subsequent block doubles the number of output
channels, until that number reaches 512. Since this network uses eight convolutional layers and
three fully connected layers, it is often called VGG-11. We halve height and width at each block,
finally reaching a height and width of 7 before flattening the representations for processing by
the fully connected part of the network.

You might also like