TECHIN513 – Managing
Signal and Data Processing
Week 8
Today’s Agenda
• CNN
• YOLO
• ICTE
• FPDAWT
Today’s Agenda
• Convolutional Neural Network
• You Only Look Once
• In Class Team Exercise
• Final Project Discussion And Work Time
Announcement
• Purchasing supplies for final project
• Budget of $40 per team
• Requests must be made by Monday, February 26 at 9:59am
Link to Request Form:
TECHIN513 Final Project Supply Request Form - Google Sheets
What is a convolutional neural network?
• A network architecture for deep learning
• CNNs can have tens or hundreds of hidden layers
• Includes a typical artificial neural network architecture
• Useful for finding patterns in images to recognize objects
Stages of a CNN
• Input image
• Convolution
• Activation
• Pooling
• Flattening
• Fully Connected ANN
• Activation
image source
• Output
Convolutional Operations | Medium
pixel values range
Greyscale Image Data from 0 to 255
24x16 matrix
How Do Machines Read and Store Images? | Analytics Vidhya
Color Image Data
one image has
three matrices or
pixel values range “channels”
from 0 to 255
How Do Machines Read and Store Images? | Analytics Vidhya
CNN Overview
Feature Extraction
Feature Extraction with CNNs | Towards Data Science
Typical Artificial Neural Network
• Each neuron in the input layer
is connected to a neuron in the
hidden layer
• Each connection has a weight
value
• Each neuron has a bias value
• The model learns these values
during the training process
• Values are updated with each
new training example
Introduction to Deep Learning - MATLAB
Typical Artificial Neural Network
• Each neuron in the input layer
is connected to a neuron in the
hidden layer
• Each connection has a weight
value
• Each neuron has a bias value
• The model learns these values
during the training process
• Values are updated with each
new training example
Introduction to Deep Learning - MATLAB
Convolutional Neural Network
• The weights and bias values are
the same for all neurons in a
hidden layer
• All hidden layers are detecting
the same feature (e.g. edge) in
different regions of an image
• The network is better equipped
to detect the feature regardless
of its location in an image
Introduction to Deep Learning - MATLAB
Convolutional Neural Network
• The weights and bias values are
the same for all neurons in a
hidden layer
• All hidden layers are detecting
the same feature (e.g. edge) in
different regions of an image
• The network is better equipped
to detect the feature regardless
of its location in an image
Introduction to Deep Learning - MATLAB
Convolutional Operation
An operation on two functions
which produces a third
combined function
Convolution Integral | Statistics How To
Convolutional Operation
kernel types
• A convolutional kernal is a
small 2D matrix
• The kernal maps on to the
input image by matrix
multiplication and addition
• The output is a matrix of
lower dimensions
Sliding window protocol
where stride =1
Lower dimension matrix
(feature map) Convolutional Operations | Medium
Convoluting to Create Feature Maps
CNNs | simplilearn
45*0
+ 12*(-1)
+ 5*0
+ 22*(-1)
+ 10*5
+ 35*(-1)
+ 88*0
+ 26*(-1)
+ 51*0
= - 45
Activation Step Rectified
Linear
Unit
• Activation function takes the
output of a neuron and maps it
to the highest positive value
• If output is negative, the
function maps it to zero
• ReLU is a commonly used
activation function in deep
learning
Introduction to Deep Learning - MATLAB
ReLu activation retains only positive values
CNNs | simplilearn
CNN Overview
Pooling Step New
Feature
Map
• Pooling reduces dimensionality
of features map by using
different filters
• Condenses regions of neurons
into a single output
• Simplifies model by reducing
the number of parameters the
model needs to learn
• Pooling retains the most
important information but
lowers resolution
Introduction to Deep Learning - MATLAB
Pooling Applies Various Filters
CNNs | simplilearn
Pooling Enhances Edges Three iterations of
max pooling using a
(2, 2) kernel
Features (edges) are
enhanced, but
resolution is reduced
Pooling In Convolutional Neural Networks | paperspace
CNN Overview
Flattening
• The flatten layer lies
between the CNN and the
Softmax
ANN
• Converts the feature map
from the pooling layer into
an input that the ANN can
understand
• The ANN requires a one-
dimensional array as input
Artificial Neural Network
Feature Maps | educative.io , Dense layers | Pysource
Softmax Activation Step
Mathematical
representation
Last fully
• Often used as the last connected layer
activation function to
normalize the output of a
network to a probability
distribution over predicted
output classes
• The output of a Softmax is a
vector with probabilities of
each possible outcome.
Softmax Activation Function | Towards Data Science
CNN Output Layer
The final layer of the CNN architecture provides the final
classification output
A vector of length K
equal to the
number of classes
Introduction to Deep Learning - MATLAB
Classification, Detection, & Segmentation
or object localization
Object Segmentation vs. Object Detection | LinkedIn
You Only Look Once
• "You Only Look Once" (YOLO)
• YOLOv1 paper published May 2016
• Uses CNN as its backbone
network architecture
• YOLO predicts bounding boxes
and class probabilities for these
boxes simultaneously
• Improvement on previous model:
R-CNN
https://arxiv.org/abs/1506.02640
YOLO
https://pjreddie.com/darknet/yolo/
https://arxiv.org/abs/1506.02640
Previous Model for Image Detection: R-CNN
• Regions with CNN features
• Published Oct 2014
• link to article
• Splits an image into 2000
regions in boundary boxes
then classify each region
• Drawbacks:
• Long time to train – classify
2000 regions per image
• Detection not in real-time: 47
sec for test image
• Boundary box inaccuracies
R-CNN | Towards Data Science
How does YOLO work?
• Resizes the input image into YOLO Architecture
448x448
• A 1x1 convolution is first applied
to reduce the number of
channels
• 24 convolutional layers
• 4 max pooling layers
• The activation function is ReLU
• Two fully connected layers
https://arxiv.org/abs/1506.02640
What is Object
Detection?
First let’s talk about
object localization
36
What is object localization?
width (bw)
Object localization is
finding what and where a
(single) object exists in a
single image
height
(bh)
(bx, by)
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train
box are described as a vector
y_train
Pc 1
Probability Bx 0.5
of class By 0.6
Bw 0.4
Bh 0.3
C1 1
C2 0
C1 = car class
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0) x_train
box are described as a vector
y_train
Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 1
C2 0 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO? (0.5,0.6)
• The coordinates of a bounding (0,0)
box are described as a vector
Output of
Neural Network
Pc 1 (bx,by)
Probability Bx 0.5 bh
of class By 0.6
0.3
Bw 0.4
Bh 0.3 bw
C1 0.97
C2 0.03 (1,1)
C1 = car class 0.4
C2 = motorcycle class
How is object localization described
numerically in YOLO?
• The coordinates of a bounding x_train
box are described as a vector
y_train
Pc 0
Probability Bx -
of class By -
Bw -
Bh -
C1 -
C2 -
C1 = car class
C2 = motorcycle class
What about multiple objects?
YOLO algorithm | YouTube
What about multiple objects?
Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -
C1 = dog class
C2 = person class
YOLO algorithm | YouTube
What about multiple objects?
Person’s
object
belongs to
this cell
Pc 1
Bx 0.05
By 0.3
Bw 2
Bh 1.3
C1 1
C2 0
C1 = dog class
C2 = person class
YOLO algorithm | YouTube
What about multiple objects?
Pc 1
Bx 0.32
By 0.02
Bw 2.2
Bh 1.7
C1 0
C2 1
C1 = dog class
C2 = person class
YOLO algorithm | YouTube
What about multiple objects?
All other cells 4x4x7 matrix
Pc 0
Bx -
By -
Bw -
Bh -
C1 -
C2 -
C1 = dog class
C2 = person class
YOLO algorithm | YouTube
Training the YOLO Model
YOLO algorithm | YouTube
YOLO Prediction
YOLO algorithm | YouTube
Evaluating Image Detection Models
• Common Objects in Context
(COCO) dataset
• Published by Microsoft
• Used to evaluate algorithms’
performance of real-time
object detection
• 330,000 images
• 200,000 are labeled Pc 1
• 1.5 million object instances y_train
Bx
By
0.5
0.6
Bw 0.4
• 5 captions per image Bh
C1
0.3
1
C2 0
COCO Dataset | viso.ai
Evaluating Image Detection Models
Error Matrix
• Mean Average Precision (mAP)
• Benchmark metric used to
evaluate the robustness of
object detection models
• Incorporates mathematics image source
from:
• Error matrix
• Intersection over union (IoU)
ratio for bounding box
image source
Understanding Confusion Matrix | Towards Data Science
Best Object Detection Models
Object Detection | viso.ai
YOLOv8
YOLOv8 Tutorial - Colaboratory (google.com)
YOLOv8
Ultralytics YOLOv8 | GitHub
ICTE