100% found this document useful (1 vote)

268 views78 pages

Convolutional Neural Network

Convolutional neural networks (CNNs) are a type of neural network that can scale to process larger images more efficiently than fully connected networks. CNNs use locally connected networks, convolutions with shared weights, and pooling to analyze images. LeNet was an early and influential CNN developed by Yann LeCun that achieved strong performance on handwritten digit recognition using these techniques. It demonstrated that CNNs could outperform other approaches while requiring fewer parameters.

Uploaded by

Aaqib Inam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

268 views78 pages

Convolutional Neural Network

Uploaded by

Aaqib Inam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Convolutional Neural Networks

Na Lu
Xi’an Jiaotong University
Intuition of CNN
• In the former section, we deal with images that were relatively low in resolution, such
as small image patches and small images of hand-written digits.
• In the sparse autoencoder, "fully connection" of all the hidden units to all the input
units is employed.
– On the relatively small images that we were working with (e.g., 8x8
patches, 28x28 images for the MNIST dataset), it was computationally
feasible to learn features on the entire image.
– However, with larger images (e.g., 96x96 images) learning features that
span the entire image (fully connected networks) is very computationally
expensive--about 104 input units, and assuming you want to learn 100
features, you would have on the order of 106 parameters to learn. The
feedforward and backpropagation computations would also be about
102 times slower, compared to 28x28 images.
Intuition of CNN
• In this section, we will develop methods which
will allow us to scale up these methods to more
realistic datasets that have larger images.
Convolutional Neural Network
• Two key ingredients
– Locally connected networks
– Convolutions
– Pooling
– Local receptive field
– Weight sharing
Locally Connected Networks
• One simple solution to this problem is to restrict the connections between
the hidden units and the input units, allowing each hidden unit to connect to
only a small subset of the input units.

• Specifically, each hidden unit will connect to only a small contiguous region
of pixels in the input.

• For input modalities different than images, there is often also a natural way
to select "contiguous groups" of input units to connect to a single hidden
unit as well; for example, for audio, a hidden unit might be connected to only
the input units corresponding to a certain time span of the input audio clip.
Locally Connected Networks
Locally Connected Networks
• The idea of having locally connected networks also
draws inspiration from how the early visual system is
wired up in biology. Specifically, neurons in the visual
cortex have localized receptive fields (i.e., they respond
only to stimuli in a certain location).
Convolutions
• Natural images have the property of being stationary,
meaning that the statistics of one part of the image are
the same as any other part.

• This suggests that the features that we learn at one part

of the image can also be applied to other parts of the
image, and we can use the same features at all locations.
Convolutions
Convolutions
• Example:
– Suppose we have learned features over small (say 8x8) patches
sampled randomly from the larger image, we can then apply this
learned 8x8 feature detector anywhere in the image.

– Specifically, we can take the learned 8x8 features and

convolve them with the larger image, thus obtaining a different
feature activation value at each location in the image.
Convolutions
• Suppose you have learned features on 8x8 patches sampled from a
96x96 image.

• Suppose further this was done with an autoencoder that has 100
hidden units.

• To get the convolved features, for every 8x8 region of the 96x96
image, run it through the trained sparse autoencoder to get the
feature activations. This would result in 100 sets 89x89 convolved
features.
Convolutions
• Formal illustration
– Given some large r×c images xlarge
– We first train a sparse autoencoder on
small m×n patches xsmall sampled from
these images
– Learning k features f = σ(W(1)xsmall + b(1))
(where σ is the sigmoid function), given
by the weights W(1) and biases b(1) from
the visible units to the hidden units.
– For every m×n patch xs in the large
image, we compute fs = σ(W(1)xs + b(1)),
giving us fconvolved, a k×(r-m+1)×(c-n+1)
array of convolved features.
Pooling
• Problem
– In theory, one could use all the extracted features with a
classifier such as a softmax classifier, but this can be
computationally challenging.
– Consider images of size 96x96 pixels, and suppose we have
learned 400 features over 8x8 inputs.
– Each convolution results in an output of size (96−8+1)*(96−8+1)
=7921, and since we have 400 features, this results in a vector
of 892*400 =3,168,400 features per example.
– Learning a classifier with inputs having 3+ million features can
be unwieldy, and can also be prone to over-fitting.
Pooling
• Recall that we decided to obtain convolved features because
images have the "stationarity" property, which implies that features
that are useful in one region are also likely to be useful for other
regions.
• To describe a large image, one natural approach is to aggregate
statistics of these features at various locations. For example, one
could compute the mean (or max) value of a particular feature over
a region of the image.
• The aggregation operation is called pooling, or sometimes mean
pooling or max pooling (depending on the pooling operation
applied).
Pooling
• Formal description
– After obtaining the convolved features, we decide the
size of the region to pool the convolved features over.
– Then, divide the convolved features into disjoint
regions, and take the mean (or maximum) feature
activation over these regions to obtain the pooled
convolved features.
– These pooled features can then be used for
classification.
Pooling
• Pooling for Invariance
– If one chooses the pooling regions to be contiguous areas in the
image and only pools features generated from the same
(replicated) hidden units. These pooling units will be translation
invariant.

– This means that the same (pooled) feature will be active even
when the image undergoes (small) translations.
Pooling
Pooling
Weight Sharing
• A special case of pooling.

• For different local regions, use the same weight

for the same feature.
The replicated feature approach
(currently the dominant approach for neural networks)
• Use many different copies of the same The red connections all
feature detector with different positions. have the same weight.
– Could also replicate across scale and
orientation (tricky and expensive)
– Replication greatly reduces the number of free
parameters to be learned.
• Use several different feature types, each
with its own map of replicated detectors.
– Allows each patch of image to be represented
in several ways.
Backpropagation with weight
constraints
• It’s easy to modify the
backpropagation algorithm to
incorporate linear constraints
between the weights.
• We compute the gradients as usual,
and then modify the gradients so
that they satisfy the constraints.
– So if the weights started off
satisfying the constraints, they
will continue to satisfy them.
What does replicating the feature detectors
achieve?
• Equivariant activities: Replicated features do not make the neural activities
invariant to translation. The activities are equivariant.
translated
representation representation
by active
neurons

translated
image image

• Invariant knowledge: If a feature is useful in some locations during training,

detectors for that feature will be available in all locations during testing.
Pooling the outputs of replicated
feature detectors
• Get a small amount of translational invariance at each level by averaging
four neighboring replicated detectors to give a single output to the next
level.
– This reduces the number of inputs to the next layer of feature
extraction, thus allowing us to have many more different feature maps.
– Taking the maximum of the four works slightly better.
• Problem: After several levels of pooling, we have lost information about
the precise positions of things.
– This makes it impossible to use the precise spatial relationships
between high-level parts for recognition.
Question
Convolutional Neural Networks
• Compared to the standard feedforward neural
networks with similarly-sized layers:
– CNNs have much fewer connections and parameters

– So they are easier to train

– While their theoretically best performance is likely to

be only slightly worse.
Convolutional Neural Networks
• One successful application of CNNs:
Le Net
Le Net
• Yann LeCun and his collaborators developed a really good
recognizer for handwritten digits by using backpropagation in a
feedforward net with:
– Many hidden layers
– Many maps of replicated units in each layer.
– Pooling of the outputs of nearby replicated units.
– A wide net that can cope with several characters at once even if
they overlap.
– A clever way of training a complete system, not just a recognizer.
• This net was used for reading ~10% of the checks in North America.
• Look the impressive demos of LENET at [Link]
The architecture of LeNet5

C1 S2 C3 S4 F5 F6 F7
Le Net
Le Net

• LeNet 5 has seven layers.

• Input: 32×32 pixel image. The largest character is 20×20. (Note that all
important information should be in the center of the receptive field of the
highest level feature detectors)
• C1 and C2 are convolutional layers
• S1 and S2 are subsample layers
• F1, F2 and F3 are fully connected layers
Le Net

• C1: Convolutional layer with 6 feature maps of size 28×28. C1k(k=1…6)

• Each unit of C1 has a 5×5 receptive field in the input layer
• Properties: local connection, convolution, and shared weights
• C1 layer has (5×5+1) ×6=156 parameters to learn
• C1 layer has connections: 28×28×(5×5+1) ×6=122,304
• If it was fully connected, (32×32+1) ×(28 ×28) ×6=4,821,600 parameters
Le Net

• S2: Subsampling layer with 6 feature maps of size 14×14

• For each unit in S2 layer, 2×2 nonoverlapping receptive fields in C1as input
• Operation: Addition of the 2×2 units, then weighted and biased, followed
by a Sigmoid function
• Layer S2 has 6×2=12 trainable parameters
• Connections is 14×14×(2×2+1) ×6=5880
Le Net

• C3: Convolutional layer with 16 feature maps of size 10×10

• Each unit in each feature map is connected to several 5×5 neighborhoods
at identical locations in a subset of S2’s feature maps
• Layer C3 has 1516 trainable parameters
• 151,600 connections
Le Net

• S4: Subsampling layer with 16 feature maps of size 5×5

• Each unit in S4 is connected to the corresponding 2×2 receptive field in C3
• Layer S4 has 16×2=32 trainable parameters
• 5×5×(2×2+1) ×16=2000 connections
Le Net

• F5: Convolutional layer with 120 feature maps of size 1×1

• Each unit in F5 is connected to all 16 5×5 receptive fields in S4
• Layer F5 has 120×(16×25+1)=48120 trainable parameters and
connections
• Note that Layer F5 is fully connected
Le Net

• F6: Convolutional layer with 84 feature maps of size 1×1

• Each unit in F6 is fully connected to all units in F5
• Layer F5 has 84×(120+1)=10164 trainable parameters and connections
• Output layer: 10 RBF with one for each digit
• Weight update: Backpropagation
Performance
60,000 original datasets
Test error: 0.95%

540,000 artificial distortions

+60,000 original datasets
Test error: 0.8%
The 82 errors
made by LeNet5

Notice that most of the

errors are cases that
people find quite easy.
The human error rate is
probably 20 to 30 errors
but nobody has had the
patience to measure it.
Performance
Performance
Performance
Performance
Performance
Performance
Performance
Priors and Prejudice
• We can put our prior • Alternatively, we can use our
knowledge about the task into
the network by designing prior knowledge to create a
appropriate: whole lot more training data.
– Connectivity. – This may require a lot of work
– Weight constraints. (Hofman&Tresp, 1993)
– Neuron activation functions – It may make learning take much longer.
• This is less intrusive than • It allows optimization to
hand-designing the features.
– But it still prejudices the network discover clever ways of using
towards the particular way of the multi-layer network that
solving the problem that we had
in mind. we did not think of.
– And we may never fully understand
how it does it.
The brute force approach
• LeNet uses knowledge • Ciresan et. al. (2010) inject
about the invariances to knowledge of invariances by
design: creating a huge amount of
– the local connectivity carefully designed extra
– the weight-sharing training data:
– the pooling. – For each training image, they
produce many new training
• This achieves about 80 examples by applying many
errors. different transformations.
– This can be reduced to about – They can then train a large,
40 errors by using many deep, dumb net on a GPU
different transformations of without much overfitting.
the input and other tricks
(Ranzato 2008) • They achieve about 35 errors.
The errors made by the Ciresan et. al.
net
The top printed digit is the
right answer. The bottom two
printed digits are the
network’s best two guesses.

The right answer is almost

always in the top 2 guesses.

With model averaging they

can now get about 25 errors.
How to detect a significant drop in the
error rate
• Is 30 errors in 10,000 test cases significantly better than 40 errors?
– It all depends on the particular errors!
– The McNemar test uses the particular errors and can be much more
powerful than a test that just uses the number of errors.

model 1 model 1 model 1 model 1

wrong right wrong right
model 2 29 1 model 2 15 15
wrong wrong
model 2 11 9959 model 2 25 9945
right right
From hand-written digits to 3-D objects
• Recognizing real objects in color photographs
downloaded from the web is much more complicated than
recognizing hand-written digits:
– Hundred times as many classes (1000 vs 10)
– Hundred times as many pixels (256 x 256 color vs 28
x 28 gray)
– Two dimensional image of three-dimensional scene.
– Cluttered scenes requiring segmentation
– Multiple objects in each image.
• Will the same type of convolutional neural network work?
ImageNet
• 15M images
• 22K categories
• Images collected from Web
• Human labelers (Amazon’s Mechanical Turk crowd
sourcing)
• RGB images
• Variable resolution
ImageNet
• Classification goals:
– Make 1 guess about the label (Top-1 error)
– Make 5 guess about the label (Top-5 error)
The ILSVRC-2012 competition on
ImageNet
• The dataset has 1.2 million high- • Some of the best existing
resolution training images. computer vision methods were
• The classification task: tried on this dataset by leading
– Get the “correct” class in your computer vision groups from
top 5 bets. There are 1000 Oxford, INRIA, XRCE, …
classes. – Computer vision systems
• The localization task: use complicated multi-stage
systems.
– For each bet, put a box
around the object. Your box – The early stages are
must have at least 50% typically hand-tuned by
overlap with the correct box. optimizing a few parameters.

ImageNet Large Scale Visual Recognition Challenge

Examples from the test set (with the
network’s guesses)
• University of Toronto (Alex Krizhevsky) • 16.4%
•
34.1%

Error rates on the ILSVRC-2012

competition
classification
classification &localization
• University of Tokyo • 26.1% 53.6%
• Oxford University Computer Vision Group • 26.9% 50.0%
• INRIA (French national research institute in • 27.0%
CS) + XRCE (Xerox Research Center
Europe)
• University of Amsterdam • 29.5%
A neural network for ImageNet
• Alex Krizhevsky (NIPS 2012) • The activation functions were:
developed a very deep –Rectified linear units in every
convolutional neural net of hidden layer. These train much
the type pioneered by Yann faster and are more expressive
than logistic units.
LeCun. Its architecture was:
–Competitive normalization to
– 7 hidden layers not counting suppress hidden activities when
some max pooling layers. nearby units have stronger
– The early layers were activities. This helps with
convolutional. variations in intensity.
– The last two layers were
globally connected.
The CIFAR-10 dataset consists
The Architecture of 60,000 32x32 colour
images in 10 classes, with
6000 images per class. There
• Typical nonlinearities: f(x) = tanh(x) are 50000 training images
and 10000 test images.
f(x) = (1+e-x)-1
• However, Rectified Linear Units (ReLU) are used: f(x)=max(0, x)
• Empirical observation: Deep convolutional neural networks with
ReLUs train several times faster than their equivalents with tanh
units
A four-layer convolutional neural
network with ReLUs(solid line)
reaches a 25% training error rate
on CIFAR-10 six times faster than
an equivalent network with tanh
neurons (dashed line)
The Architecture

• The first convolutional layer filters the 224×224×3 input image with 96
kernels of size 11×11×3 with a stride of 4 pixels which is the distance
between the receptive field centers of neighboring neurons in the kernel
map.
• The pooling layer: form of non-linear down-sampling. Max-pooling partitions
the input image into a set of rectangles and, for each such sub-region,
outputs the maximum value.
The Architecture
• Trained with stochastic gradient descent
• On two NVIDIA GTX 580 3GB GPUs
• For about a week

• 650,000 neurons
• 60,000,000 parameters
• 630,000,000 connections
• 5 convolutional layers, 3 fully connected layers
• Final feature layer is 4096 dimensional
Data Augmentation
• The easiest and most common method to reduce
overfitting on image data is to artificially enlarge the
dataset using label-preserving transformations.

• Two different forms of image transformations have been

employed:
– Image translation
– Horizontal reflections
– Changing RGB intensities
Dropout
• Combining different models can be very useful(Mixture of experts,
majority voting, boosting, etc)

• However, training many different models could be very time

consuming.

• The solution:
– Dropout: set the output of each hidden neuron to zero w.p. 0.5
Dropout
• Dropout: set the output of each hidden neuron to zero w.p. 0.5
• The neurons which are dropped out do not contribute to the forward pass and do
not participate in backpropagation.

• So every time an input is presented, the neural network samples a different

architecture, but all these architectures share weights.

• This technique reduces complex co-adaptations of neurons, since a neuron

cannot rely on the presence of particular other neurons.

• It is forced to learn more robust features that are useful in conjunction with many
different random subsets of the other neurons.

• Without dropout, this network exhibits substantial overfitting.

• Dropout roughly doubles the number of iterations required to converge.
Tricks that significantly improve
generalization
• Train on random 224x224 • Use “dropout” to regularize
patches from the 256x256 the weights in the globally
images to get more data. Also connected layers (which
use left-right reflections of the contain most of the
images. parameters).
• At test time, combine the – Dropout means that half of
opinions from ten different the hidden units in a layer
patches: The four 224x224 are randomly removed for
corner patches plus the central each training example.
224x224 patch plus the – This stops hidden units from
reflections of those five patches. relying too much on other
hidden units.
Some more examples
of how well the deep
net works for object
recognition.

Results on the test

data:
Top-1 error rate: 37.5%
Top-5 error rate:17.0%

ILSVRC-2012
competition:
15.3%
2nd best team:
26.2% (Top-5 error rate)
The first convolutional layer

• 96 convolutional kernels of size 11×11×3 learned by the first

convolutional layer on the 224×224×3 input images.
• The top 48 kernels were learned on GPU1 while the bottom 48
kernels were learned form GPU2
• Looks like Gabor wavelets, ICA filters……
The hardware required for Alex’s net
• He uses a very efficient implementation of convolutional nets
on two Nvidia GTX 580 Graphics Processor Units (over 1000
fast little cores)
– GPUs are very good for matrix-matrix multiplies.
– GPUs have very high bandwidth to memory.
– This allows him to train the network in a week.
– It also makes it quick to combine results from 10 patches at test
time.
• We can spread a network over many cores if we can
communicate the states fast enough.
• As cores get cheaper and datasets get bigger, big neural nets
will improve faster than old-fashioned (i.e. pre Oct 2012)
computer vision systems.
Finding roads in high-resolution images
• Vlad Mnih (ICML 2012) • The task is hard for many reasons:
used a non-convolutional – Occlusion by buildings trees and cars.
net with local fields and – Shadows, Lighting changes
multiple layers of rectified – Minor viewpoint changes
linear units to find roads in • The worst problems are incorrect labels:
cluttered aerial images. – Badly registered maps
– It takes a large image patch
and predicts a binary road – Arbitrary decisions about what counts
label for the central 16x16 as a road.
pixels. • Big neural nets trained on big image
– There is lots of labeled patches with millions of examples are the
training data available for only hope.
this task.
The best road-finder
on the planet?
Two ways to average models
• MIXTURE: We can • PRODUCT: We can
combine models by combine models by
averaging their output taking the geometric
probabilities: means of their output
Model A: .3 .2 .5 probabilities:
Model B: .1 .8 .1
Model A: .3 .2 .5
Combined .2 .5 .3
Model B: .1 .8 .1
Combined .03 .16 .05 /sum
Dropout: An efficient way to average
many large neural nets
([Link]
• Consider a neural net with one
hidden layer.
• Each time we present a training
example, we randomly omit
each hidden unit with
probability 0.5.
• So we are randomly sampling
from 2^H different architectures.
– All architectures share
weights.
Dropout as a form of model averaging
• We sample from 2^H models. So only a few of
the models ever get trained, and they only get
one training example.
– This is as extreme as bagging can get.
• The sharing of the weights means that every
model is very strongly regularized.
– It’s a much better regularizer than L2 or L1
penalties that pull the weights towards zero.
But what do we do at test time?
• We could sample many different architectures
and take the geometric mean of their output
distributions.
• It better to use all of the hidden units, but to
halve their outgoing weights.
– This exactly computes the geometric mean of
the predictions of all 2^H models.
What if we have more hidden layers?
• Use dropout of 0.5 in every layer.
• At test time, use the “mean net” that has all the outgoing
weights halved.
– This is not exactly the same as averaging all the
separate dropped out models, but it’s a pretty good
approximation, and its fast.
• Alternatively, run the stochastic model several times on
the same input.
– This gives us an idea of the uncertainty in the answer.
What about the input layer?
• It helps to use dropout there too, but with a
higher probability of keeping an input unit.
– This trick is already used by the “denoising
autoencoders” developed by Pascal Vincent,
Hugo Larochelle and Yoshua Bengio.
How well does dropout work?
• The record breaking object recognition net developed by
Alex Krizhevsky uses dropout and it helps a lot.
• If your deep neural net is significantly overfitting, dropout
will usually reduce the number of errors by a lot.
– Any net that uses “early stopping” can do better by
using dropout (at the cost of taking quite a lot longer
to train).
• If your deep neural net is not overfitting you should be
using a bigger one!
Another way to think about dropout
• If a hidden unit knows • If a hidden unit has to
which other hidden units work well with
are present, it can co-adapt combinatorially many sets
to them on the training data. of co-workers, it is more
– But complex co-adaptations likely to do something
are likely to go wrong on new that is individually useful.
test data. – But it will also tend to do
– Big, complex conspiracies something that is
are not robust. marginally useful given
what its co-workers
achieve.
Convolutional Neural Networks

END

An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
Face Recognition with GNU Octave/MATLAB
No ratings yet
Face Recognition with GNU Octave/MATLAB
14 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
27 pages
Deep Learning in Visual SLAM Survey
No ratings yet
Deep Learning in Visual SLAM Survey
25 pages
Geometric Deep Learning Overview
No ratings yet
Geometric Deep Learning Overview
20 pages
Geometric Deep Learning for RNA Structure
No ratings yet
Geometric Deep Learning for RNA Structure
6 pages
Topological Deep Learning: Going Beyond Graph Data
No ratings yet
Topological Deep Learning: Going Beyond Graph Data
81 pages
ANN Unit-3 Associative Learning
No ratings yet
ANN Unit-3 Associative Learning
13 pages
Deep Learnings
No ratings yet
Deep Learnings
44 pages
Geometric Deep Learning Overview
No ratings yet
Geometric Deep Learning Overview
22 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
55 pages
Associative Memory
No ratings yet
Associative Memory
25 pages
Diffusion Models
No ratings yet
Diffusion Models
46 pages
Ann Chapter 2
No ratings yet
Ann Chapter 2
240 pages
Sequential Circuit Analysis: State Tables State Diagrams
100% (1)
Sequential Circuit Analysis: State Tables State Diagrams
17 pages
SplineCNN-Fast Geometric Deep Learning With Continuous B-Spline Kernels
No ratings yet
SplineCNN-Fast Geometric Deep Learning With Continuous B-Spline Kernels
9 pages
Department of Computer Science and Engineering (CSE)
No ratings yet
Department of Computer Science and Engineering (CSE)
11 pages
05 - Simulated Annealing - 01
100% (1)
05 - Simulated Annealing - 01
44 pages
Zhiqing Xiao - Reinforcement Learning, Theory and Python Implementation (2024)
No ratings yet
Zhiqing Xiao - Reinforcement Learning, Theory and Python Implementation (2024)
574 pages
Efficient Fine-Tuning with PEFT
No ratings yet
Efficient Fine-Tuning with PEFT
10 pages
Chapter 7. Object Recognition
No ratings yet
Chapter 7. Object Recognition
105 pages
DL Half TechKnowledge
No ratings yet
DL Half TechKnowledge
50 pages
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
No ratings yet
Geometric Deep Learning With Grids Groups Graphs Geodesics and Gauges
160 pages
Pattern Recognition for CS Scholars
0% (1)
Pattern Recognition for CS Scholars
37 pages
3 - ANN Part One PDF
No ratings yet
3 - ANN Part One PDF
30 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Analysis and Identification of Time-Invariant Systems, Time-Varying Systems, and Multi-Delay Systems Using Orthogonal Hybrid Functions
No ratings yet
Analysis and Identification of Time-Invariant Systems, Time-Varying Systems, and Multi-Delay Systems Using Orthogonal Hybrid Functions
438 pages
Deep Learning
No ratings yet
Deep Learning
800 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Jeremy Kepner and Hayden Jananthan - Mathematics of Big Data Spreadsheets, Databases, Matrices, and Graphs-The MIT Press (2018)
No ratings yet
Jeremy Kepner and Hayden Jananthan - Mathematics of Big Data Spreadsheets, Databases, Matrices, and Graphs-The MIT Press (2018)
490 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
54 pages
g3 Jakobschwichtenberg-Com-Naive-Introduction-Lie-Theory
No ratings yet
g3 Jakobschwichtenberg-Com-Naive-Introduction-Lie-Theory
13 pages
Switching and Finite Automata Theory
0% (1)
Switching and Finite Automata Theory
1 page
Experiment No. 4 TE SL-II (ANN)
100% (1)
Experiment No. 4 TE SL-II (ANN)
2 pages
Fundamentals of Nano and Quantum Photonics: Dr. Shubhadeep Bhattacharjee Electrical Engineering, IIT Hyderabad
No ratings yet
Fundamentals of Nano and Quantum Photonics: Dr. Shubhadeep Bhattacharjee Electrical Engineering, IIT Hyderabad
33 pages
Lecture 12 Learning in Vision 2022
No ratings yet
Lecture 12 Learning in Vision 2022
100 pages
Deep Learning Insights for Students
No ratings yet
Deep Learning Insights for Students
18 pages
Monocular Depth Estimation with U-Net
No ratings yet
Monocular Depth Estimation with U-Net
8 pages
Inverse DWT
No ratings yet
Inverse DWT
18 pages
Fractal Image Compression Guide
0% (1)
Fractal Image Compression Guide
18 pages
Training Feedforward DNN Guide
No ratings yet
Training Feedforward DNN Guide
9 pages
Bagging DQNs with GPU Clusters
No ratings yet
Bagging DQNs with GPU Clusters
6 pages
Intro to Supervised Learning
No ratings yet
Intro to Supervised Learning
55 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
41 pages
Ann Book
No ratings yet
Ann Book
16 pages
From Explanations To Feature Selection: Assessing SHAP Values As Feature Selection Mechanism
No ratings yet
From Explanations To Feature Selection: Assessing SHAP Values As Feature Selection Mechanism
8 pages
Brief Introduction To Neural Networks
No ratings yet
Brief Introduction To Neural Networks
244 pages
Unit 3 DL
No ratings yet
Unit 3 DL
72 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Module 3
No ratings yet
Module 3
67 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Convolutional Neuralnetworks: Abin - Roozgard
No ratings yet
Convolutional Neuralnetworks: Abin - Roozgard
54 pages
CNN2
No ratings yet
CNN2
70 pages
Deep Learning & CNN Fundamentals
No ratings yet
Deep Learning & CNN Fundamentals
56 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
An Introduction To Convolutional Neural Networks
No ratings yet
An Introduction To Convolutional Neural Networks
11 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
COSC 3101A - Design and Analysis of Algorithms 7
No ratings yet
COSC 3101A - Design and Analysis of Algorithms 7
50 pages
Filtering SQL Injection From Classic ASP
No ratings yet
Filtering SQL Injection From Classic ASP
130 pages
Role of Verbal and Non Verbal Communication in Language
No ratings yet
Role of Verbal and Non Verbal Communication in Language
14 pages
2.5.1 Feedforward Neural Networks: Products Solutions Purchase Support Community Company Our Sites
No ratings yet
2.5.1 Feedforward Neural Networks: Products Solutions Purchase Support Community Company Our Sites
2 pages
AI MCQ Bank for CE/IT Students
No ratings yet
AI MCQ Bank for CE/IT Students
18 pages
Degree of Highness or Lowness of A Vocal Tone. A. Tempo B. Volume C. Juncture D. Pitch
No ratings yet
Degree of Highness or Lowness of A Vocal Tone. A. Tempo B. Volume C. Juncture D. Pitch
2 pages
1 s2.0 S0167865523001228 Main
No ratings yet
1 s2.0 S0167865523001228 Main
8 pages
Deep Prediction Model
No ratings yet
Deep Prediction Model
7 pages
Control System II - Lecture Notes
100% (1)
Control System II - Lecture Notes
78 pages
Data Analytics v2 Brochure Skillovilla
No ratings yet
Data Analytics v2 Brochure Skillovilla
40 pages
Machine Learning Basics: Supervised & Unsupervised
No ratings yet
Machine Learning Basics: Supervised & Unsupervised
23 pages
Microsoft Machine Learning Algorithm Cheat Sheet v2 PDF
100% (1)
Microsoft Machine Learning Algorithm Cheat Sheet v2 PDF
1 page
Unit 5 - Big Data Technologies
No ratings yet
Unit 5 - Big Data Technologies
34 pages
10 AI Predictions For 2022
No ratings yet
10 AI Predictions For 2022
9 pages
Deep Convolutional Neural Networks: Structure, Feature Extraction and Training
No ratings yet
Deep Convolutional Neural Networks: Structure, Feature Extraction and Training
8 pages
Hospital Management System Project
No ratings yet
Hospital Management System Project
15 pages
Introducing Fuzzy Layers For Deep Learning: Stanton R. Price Steven R. Price Derek T. Anderson
No ratings yet
Introducing Fuzzy Layers For Deep Learning: Stanton R. Price Steven R. Price Derek T. Anderson
6 pages
IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures
No ratings yet
IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures
22 pages
An Efficient Hardware Architecture For Exploiting Sparsity in Neural Networks Master Thesis
No ratings yet
An Efficient Hardware Architecture For Exploiting Sparsity in Neural Networks Master Thesis
63 pages
Sarcasm Detection Challenge in Sentiment Analysis and Tweets
No ratings yet
Sarcasm Detection Challenge in Sentiment Analysis and Tweets
10 pages
Ec 404 Control System Jun 2020
No ratings yet
Ec 404 Control System Jun 2020
4 pages
Structural Pattern Recognition Guide
No ratings yet
Structural Pattern Recognition Guide
2 pages
Manipuri Language Sentiment Analysis
No ratings yet
Manipuri Language Sentiment Analysis
46 pages
Control System Prelab Analysis
No ratings yet
Control System Prelab Analysis
7 pages
CS4495 10 Features1
No ratings yet
CS4495 10 Features1
66 pages
Note 1: Lecture Notes of Control Systems I - ME 431/analysis and Synthesis of Linear Control System - ME862
No ratings yet
Note 1: Lecture Notes of Control Systems I - ME 431/analysis and Synthesis of Linear Control System - ME862
6 pages
(Lecture Notes in Computer Science 6703 Lecture Notes in Artificial Intelli PDF
No ratings yet
(Lecture Notes in Computer Science 6703 Lecture Notes in Artificial Intelli PDF
374 pages
AnalytixLabs - PostGrad Cert in DATA ANALYTICS For Business
No ratings yet
AnalytixLabs - PostGrad Cert in DATA ANALYTICS For Business
40 pages
Advanced OTBI Workshop Exercises
No ratings yet
Advanced OTBI Workshop Exercises
8 pages
0255e267c0b69-Ch - 7 (Nyquist Stability Criterion) Solution
No ratings yet
0255e267c0b69-Ch - 7 (Nyquist Stability Criterion) Solution
6 pages
Krashen Acquisition Learning Hypotheses
100% (2)
Krashen Acquisition Learning Hypotheses
35 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
17 pages

Convolutional Neural Network

Uploaded by

Convolutional Neural Network

Uploaded by

Convolutional Neural Networks

• This suggests that the features that we learn at one part

– Specifically, we can take the learned 8x8 features and

• For different local regions, use the same weight

• Invariant knowledge: If a feature is useful in some locations during training,

– So they are easier to train

– While their theoretically best performance is likely to

• LeNet 5 has seven layers.

• C1: Convolutional layer with 6 feature maps of size 28×28. C1k(k=1…6)

• S2: Subsampling layer with 6 feature maps of size 14×14

• C3: Convolutional layer with 16 feature maps of size 10×10

• S4: Subsampling layer with 16 feature maps of size 5×5

• F5: Convolutional layer with 120 feature maps of size 1×1

• F6: Convolutional layer with 84 feature maps of size 1×1

540,000 artificial distortions

Notice that most of the

The right answer is almost

With model averaging they

model 1 model 1 model 1 model 1

ImageNet Large Scale Visual Recognition Challenge

Error rates on the ILSVRC-2012

• Two different forms of image transformations have been

• However, training many different models could be very time

• So every time an input is presented, the neural network samples a different

• This technique reduces complex co-adaptations of neurons, since a neuron

• Without dropout, this network exhibits substantial overfitting.

Results on the test

• 96 convolutional kernels of size 11×11×3 learned by the first

You might also like