Notes On Introduction To Deep Learning
Notes On Introduction To Deep Learning
Recently, Deep Learning has gained mass popularity in the world of science
computing. When we fire up Alexa or Siri, sometimes we wonder how the machine
is able to self-actualize decisions and make right choices ?
Well, Deep learning and AI enable the machines to perform these functions making
our lives easier and simpler. Deep learning often tends to alleviate the levels of
customer experience and the aura of premiumness is unmatched.
Deep learning is a subset function of AI that imitates the functioning of the human
brain and borrows the skill of processing data for use in detecting objects,
recognizing speech, translating languages, and making decisions. It is a type of
machine learning that works based on the structure and functioning of the human
brain.
Deep learning has evolved hand-in-hand with the digital era, which has brought
about a revolution in terms of data extraction in all forms and from every region of
the world.
This data, renowned as Big data, is drawn from the sensational sources like social
media, internet search engines, e-commerce platforms, and online cinemas, among
others.
Slide : 8
Neural networks can adapt to changing input; so the network generates the best
possible result without needing to redesign the output criteria. The concept of neural
networks, which has its roots in artificial intelligence, is swiftly gaining popularity
in the development of trading systems.
Neural networks are broadly used, with applications for financial operations,
enterprise planning, trading, business analytics, and product maintenance. Neural
networks have also gained widespread adoption in business applications such as
forecasting and marketing research solutions, fraud detection, and risk assessment.
A neural network evaluates price data and unearths opportunities for making trade
decisions based on the data analysis. The networks can distinguish subtle nonlinear
interdependencies and patterns other methods of technical analysis cannot.
Slide : 10
Feedforward Propagation –
is the way to move from the Input layer (left) to the Output layer (right)
in the neural network. The flow of information occurs in the forward
direction. The input is used to calculate some intermediate function in the
hidden layer, which is then used to calculate an output.
What is Backpropagation?
Slide: 11
Ho
w Backpropagation Algorithm Works
5. Travel back from the output layer to the hidden layer to adjust the
weights such that the error is decreased.
Slide :12
Explanation :-
We know, neural network has neurons that work in correspondence
of weight, bias and their respective activation function. In a neural
network, we would update the weights and biases of the neurons on
the basis of the error at the output. This process is known as back-
propagation. Activation functions make the back-propagation
possible since the gradients are supplied along with the error to
update the weights and biases.
3). Tanh Function :- The activation that works almost always better than
sigmoid function is Tanh function also knows as Tangent Hyperbolic
function. It’s actually mathematically shifted version of the sigmoid function.
Both are similar and can be derived from each other.
Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) - 1
OR
tanh(x) = 2 * sigmoid(2x) - 1
Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values
lies between -1 to 1 hence the mean for the hidden layer comes out be 0
or very close to it, hence helps in centering the data by bringing mean
close to 0. This makes learning for the next layer much easier.
4). RELU :- Stands for Rectified linear unit. It is the most widely used
activation function. Chiefly implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0
otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the
errors and have multiple layers of neurons being activated by the ReLU
function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid
because it involves simpler mathematical operations. At a time only a
few neurons are activated making the network sparse making it efficient
and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
Slide: 15
Overfitting is a modeling error in statistics that occurs when a
function is too closely aligned to a limited set of data points. ... Thus,
attempting to make the model conform too closely to slightly
inaccurate data can infect the model with substantial errors and
reduce its predictive power.
Overfitting Example
For example, a university that is seeing a college dropout rate that is higher than what
it would like decides it wants to create a model to predict the likelihood that an
applicant will make it all the way through to graduation.
To do this, the university trains a model from a dataset of 5,000 applicants and their
outcomes. It then runs the model on the original dataset—the group of 5,000
applicants—and the model predicts the outcome with 98% accuracy. But to test its
accuracy, they also run the model on a second dataset—5,000 more applicants.
However, this time, the model is only 50% accurate, as the model was too closely fit
to a narrow data subset, in this case, the first 5,000 applications.
L2 & L1 regularization
L1 and L2 are the most common types of regularization. These update the
general cost function by adding another term known as the regularization
term.
Due to the addition of this regularization term, the values of weight matrices
decrease because it assumes that a neural network with smaller weight
matrices leads to simpler models. Therefore, it will also reduce overfitting to
quite an extent.
In L2, we have:
In L1, we have:
In this, we penalize the absolute value of the weights. Unlike L2, the weights
may be reduced to zero here. Hence, it is very useful when we are trying
to compress our model. Otherwise, we usually prefer L2 over it.
Dropout
So what does dropout do? At every iteration, it randomly selects some nodes
and removes them along with all of their incoming and outgoing connections
as shown below.
So each iteration has a different set of nodes and this results in a different
set of outputs. It can also be thought of as an ensemble technique in
machine learning.
Ensemble models usually perform better than a single model as they capture
more randomness. Similarly, dropout also performs better than a normal
neural network model.
Early stopping
Slide: 17
Deep learning models are full of hyper-parameters and finding the best
configuration for these parameters in such a high dimensional space is not a
trivial challenge.
The cost or loss function has an important job in that it must faithfully distill
all aspects of the model down into a single number in such a way that
improvements in that number are a sign of a better model.
Yann LeCun developed the first CNN in 1988, and named it LeNet. Then it
was primarily used for recognizing characters like ZIP codes and digits.
CNNs now known as ConvNets, consist of multiple layers structure and are
mostly used for image processing and object detection.
CNN has a convolution layer that has several filters to deal with its intricacy
and perform the convolution operation.
CNN's also have a Rectified Linear Unit (ReLU) layer to perform operations
on elements and present a rectified feature map as an output.
The rectified feature map is next fed into a pooling layer and as the name
suggests this layer converts the resulting two-dimensional arrays from the
pooled feature map into a single, continuous, linear vector by flattening it.
2. Long Short Term Memory Networks
LSTMs are a subset of Recurrent Neural Networks (RNN) that are specialised
in learning and memorizing long-term information. By default, LSTMs are
supposed to recall past information for long periods of time.
LSTMs have a chain-like structure where four unique layers are stacked.
LSTMs are typically used for time-series predictions, speech synthesis,
language modeling and translation, music composition, and pharmaceutical
development.
They are programmed to forget irrelevant parts of the data and selectively
update the cell-state values.
Due to this dynamic behaviour, the output from LSTMs is allowed to be fed as
input here.
The output from the LSTM becomes an input to the current phase allowing to
memorize previous inputs due to its efficient internal memory.
RNNs are mostly used for image captioning, time-series analysis, natural-
language processing, handwriting recognition, and machine translation
RNNs can process inputs of varied lengths. The more the computation, the
more will be the possibility of information to be gathered and in addition the
model size does not increase with the input size.
GANs are generative deep learning algorithms that are responsible for
producing new data instances that identify with the training data provided.
GANs have two main components: a generator, which is used to generate fake
data, and a discriminator, which learns from that false information.
GANs expertise are used to generate realistic images and cartoon characters,
create photographs of human faces, and render 3D objects.
You might have noticed GANs logos on video games as developers use GANs
to upgrade low-resolution, 2D textures in vintage video games by recreating
them in surreal 4K or higher resolutions via image training.
They are also used to improve astronomical images and simulate gravitational
lensing for dark-matter research.
During the training period, the Discriminator learns to distinguish between real
and fake data and is able to rectify whenever the Generator produces fake data.
RBFNs are an example of artificial neural networks, mainly used for function
approximation problems.
Radial basis function networks are considered better from other neural
networks because of their universal approximation method and faster learning
speed.
An RBF network is a special type of feed forward neural network. It consists
of three different layers, namely the input layer, the hidden layer and the output
layer.
An RBF network with 10 nodes in its hidden layer is chosen. The training of
the RBF model is terminated once the calculated error boils down to ideal
values (i.e. 0.01) or number of training iterations (i.e. 500) already was
completed.
6. Multilayer Perceptrons
MLPs consist of an input layer and an output layer that are fully connected
with the hidden layers in between.
They have the same number of input and output layers but may have multiple
hidden layers, which act as the true computation engine of MLPs.
They are used to build speech-recognition, financial prediction, and carry out
data compression.
The data is fed to the input layer of the network. Then the layers of neurons
form a pattern which enables the signal to pass in one direction.
MLPs compute the input with the entities that exist between the input layer
and the hidden layers.
Activation functions like ReLUs, sigmoid functions, and Tanhs allow MLPs to
determine which nodes to use.
MLPs assist the model to understand the correlation and learn the
dependencies between the independent and the target variables from a
particular training data set.
SOMs are created to help users access and understand this high-dimensional
information.
SOMs don't have activation functions in neurons, so they initialize weights for
each node and choose a vector at random from the training data.
SOMs examine every node to find which weights are the most likely input
vector and the most suitable node is called the Best Matching Unit(BMU).
SOMs discover the crowd around BMU’s neighborhood, which tends to get
lower over time. The closer a node is to a BMU, the more its weight changes
and the winning weight is awarded to the sample vector.
DBNs are generative graphical models or a class of deep neural networks that
consist of multiple layers of stochastic, latent variables.The latent variables
have binary values and are often called hidden units which are connected
between the layers but not within a single layer.
DBNs draw samples from the visible units using a single pass of ancestral
sampling throughout the model.
DBNs learn that the values of the latent variables in every layer can be
concluded by a single, bottom-up pass.
RBMs accept the input and encode it via numbers in the forward pass.RBMs
combine each input with the individual's own weight and one overall bias unit.
10. Autoencoders
Then learns how to reconstruct the data back from the encoded compression to
a representation that is as close to the original input provided at first.
Autoencoders are supposed to first encode the image, then reduce the size of
the input into a smaller entity. Finally, the autoencoder decodes the image in
order to generate the reconstructed image.
Conclusion
All the Deep learning algorithms show us that why are they preferred over other
techniques. All the algorithms compel us to use deep learning as they have become
the norm of the world lately and also serve to our comfort with time, effort and ease
of use.
Deep learning has made the working of computers to actually become smart and
make them work according to our needs.
With the ever growing data, it can be concluded that these algorithms would only
become more efficient with time and would truly be able to replicate the juggernauts
of a human brain.