0% found this document useful (0 votes)
32 views45 pages

An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass

This document provides an introduction to neural networks. It begins by defining machine learning and discussing common machine learning applications such as digit recognition, face recognition, and recommendation engines. It then describes different types of neural networks including feedforward neural networks, convolutional neural networks, and discusses concepts like neurons, layers, activation functions, and backpropagation for training networks. The document provides examples to help explain these concepts in a clear and concise manner.

Uploaded by

GiGa GF
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
32 views45 pages

An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass

This document provides an introduction to neural networks. It begins by defining machine learning and discussing common machine learning applications such as digit recognition, face recognition, and recommendation engines. It then describes different types of neural networks including feedforward neural networks, convolutional neural networks, and discusses concepts like neurons, layers, activation functions, and backpropagation for training networks. The document provides examples to help explain these concepts in a clear and concise manner.

Uploaded by

GiGa GF
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 45

An Introduction to Neural

Networks

Instituto Tecgraf PUC-Rio


Nome: Fernanda Duarte
Orientador: Marcelo Gattass
What is Machine Learning?
A machine learning algorithm is an algorithm that is able to learn from data.

A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves
with experience E. (Mitchell, 1997)
Applications
Digit Recognition Face Recognition Recommendation Engines

Virtual assistants (Cortana, Siri etc) Self-driving vehicles


Surveillance systems
Tasks...

Formal tasks: Playing board games, solving puzzles, mathematical and and logic
problems → Easier to code!

Expert tasks: Medical diagnosis, engineering, scheduling.

Mundane tasks: Everyday speech, written language, perception, walking, object


recognition and manipulation.
Artificial Neural Networks
Neuron: biological inspiration for computation

Neuron Artificial neuron/unit


Perceptron (Frank Rosenblatt, 1957)
- Algorithm for learning a binary classifier.
- Only capable of learning linearly separable patterns.

or
Feedforward Neural Networks
(or Deep Feedforward Networks or Multilayer
Perceptrons (MLP)) (see “Deep Learning” book, Ian Goodfellow et al.)

- Multilayer structure → “sophisticated decision making” (a unit in the second


layer can make a decision at a more complex and more abstract level than unit
in the first layer) → Learn features directly from the data.

- Nonlinearity (extends the kinds of functions that we can represent with our
neural network, e.g. XOR function (“exclusive or” ))
Feedforward Neural Networks
Goal: Approximate some function !*

Example → A classifier " = !* (#)

input #, category (label) "

A feedforward network defines a mapping " = ! (#; $) and


learn the values of the parameter $ that result in the best
function approximation.

Feedforward: information flows in one direction (input layer → output layer)


Why “network”? - Math Intuition:
- Typically represented by composing together many different functions.

Example: functions ! (1), ! (2), ! (3) connected in a chain, to form

! (") = ! (3)( ! (2)( ! (1)(")))

In this case, ! (1) is called the first hidden layer of the network, ! (2) is
called the second hidden layer, and the final layer ! (3) is called the output
layer.
- Why hidden layer? Behavior not directly specified → learning algorithm must
decide how to use those layers to form ! (") that best approximates !*.

- Length of the chain gives the depth of the model → deep learning (you can
“stack” multiple layers)
Graph representation of the network
- The feedforward network model is associated with a directed acyclic graph
(DAG) describing how the functions are composed together.
Artificial neuron
Fully-connected layers
- Neurons between two adjacent layers are fully pairwise connected, but neurons
within a single layer share no connections.
Feedforward computation

- The abstraction of a layer has the nice property that it allows us to use efficient
vectorized code (e.g. matrix multiplies).

- Think of each hidden layer as a vector, where each value represents a


neuron/unit.

- Repeated matrix multiplications interwoven with activation function.


Nonlinear!
Example of activation function: Sigmoid

Feedforward computation
Example of activation function: Sigmoid

Feedforward computation
Example of activation function: Sigmoid

Feedforward computation

Obs.: The output layer neurons most commonly have


a different activation function → e.g. softmax for
class scores (classification), linear functions for real-
valued target (regression), etc.
Example of activation function: Sigmoid

Feedforward computation

are the learnable parameters


of the network!
What about learning?
Optimization: find the parameters that minimizes the cost function (or loss function).

A loss function C is a measure of how wrong the model is in terms of its ability to
estimate the relationship between ! and ", i.e., " = #* (!), with the chosen
parameters. (e.g. Mean Squared Error (MSE))

Gradient Descent is a very common optimization algorithm.

Obs.: Training → Training + Validation sets


Inference → Test set
Gradient Descent
The gradient of a function gives the direction of steepest ascent
layer l-1 layer l

i j

The direction of steepest descent is the negative gradient.

Parameters update
during learning
process.
! is the learning rate
(hyperparameter)
Gradient Descent
Backpropagation
How to compute the gradient of the cost function with respect to the weights w and biases b
(efficiently)?

The backprop algorithm gives us detailed insights into how changing the weights and biases
changes the overall behaviour of the network.

Propagate the error backwards.

Main idea: use chain rule to compute the gradients.


Example:
Backpropagation
Main idea: use chain rule to compute the gradients.
Example:
Backpropagation
(Cost function of learning
Main idea: use chain rule to compute the gradients. example i)
Example:
Backpropagation
Main idea: use chain rule to compute the gradients.
Example:
Backpropagation
Main idea: use chain rule to compute the gradients.
Backpropagation

See: http://neuralnetworksanddeeplearning.com/chap2.html#the_backpropagation_algorithm
Learning process (summary)
For each learning example i in training set:

1 - Feedforward computation;

2 - Backpropagation;

3 - Weight update.
- Learning rate (!)

- Regularization (prevent overfitting)

- Epoch

- Activation function alternatives (ReLU, tanh, etc)

- Stochastic Gradient Descent (SGD) → Minibatch


Convolutional Neural Networks (CNN)

(AlexNet)
Convolutional Neural Networks (CNN or ConvNets)
- Useful when the proximity between two data points indicates how related they are (e.g.: pixels in
images!) → CNNs preserves spatial structure.

- The neurons in a layer will only be connected to a small region of the layer before it, instead of all
of the neurons in a fully-connected manner → less parameters!

- Convolutional Neural Networks take advantage of the fact that the input consists of images and
they constrain the architecture in a more sensible way. In particular, unlike a regular Neural
Network, the layers of a ConvNet have neurons arranged in 3 dimensions: width, height, depth.
Convolutional Neural Networks (CNN or ConvNets)
Every layer of a ConvNet transforms one volume of activations to another through a differentiable
function.

We use three main types of layers to build a basic CNN architecture: Convolutional Layer, Pooling Layer,
and Fully-Connected Layer.

Convolution Pooling Convolution Pooling Fully Fully Output predictions


Connected Connected
Convolutional Layer
The CONV layer’s parameters consist of a set of learnable filters → Every filter is small spatially (along
width and height), but extends through the full depth of the input volume (e.g. 5x5x3 for images with 3
color channels).
depth column

Convolutional Layer

Forward pass: We slide (or convolve) each filter across the input volume and compute dot products
between the entries of the filter and the input element by element, followed by an nonlinear activation
function elementwise.

Every filter produce a 2-dimensional activation map. (e.g. if we use 12 filters of dimensions 5x5x3, the
conv layer may have an output volume of 32x32x12, i.e., 12 activation maps with dimensions 32x32)

Intuitively, the network will learn filters that activate when they see some type of visual feature, such as
an edge of some orientation, for example.
zero-padding
Pooling Layer
Its function is to progressively reduce the spatial size of the representation to reduce the amount of
parameters and computation in the network, and hence to also control overfitting.
Fully-Connected Layer

Same as before.

Used to output the final classification scores.


https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
CNN for Real-Time Object Detection using
YOLO (“You Only Look Once)
Trabalho
Construir uma rede neural feedforward com (no mínimo) duas camadas escondidas (fully-
connected) para realizar reconhecimento automático de dígitos escritos à mão, utilizando a base
de dados MNIST.

http://yann.lecun.com/exdb/mnist/

(A descrição do trabalho será enviada por e-mail)


References
https://www.kdnuggets.com/2018/02/8-neural-network-architectures-machine-learning-researchers-need-learn.html

https://www.youtube.com/watch?v=d14TUNcbn1k

http://neuralnetworksanddeeplearning.com/chap1.html

http://www.deeplearningbook.org/

https://www.youtube.com/watch?v=1L0TKZQcUtA

https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220
References
Backpropagation:

https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

https://ayearofai.com/rohan-lenny-1-neural-networks-the-backpropagation-algorithm-explained-abf4609d4f9d

http://neuralnetworksanddeeplearning.com/chap2.html

https://www.youtube.com/watch?v=tIeHLnjs5U8
References
CNNs:

https://hashrocket.com/blog/posts/a-friendly-introduction-to-convolutional-neural-networks

https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

http://web.stanford.edu/class/cs231a/lectures/intro_cnn.pdf

http://cs231n.stanford.edu/

You might also like