Unit 5
Unit 5
Dr [Link] 1|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
5. 2000s: Resurgence and Deep Learning
Larger datasets, innovative structures, and enhanced processing capability spurred a
comeback. Deep learning has shown amazing effectiveness in a number of disciplines by
utilizing numerous layers.
6. 2010s-Present: Deep Learning Dominance
Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two deep
learning architectures, dominated machine learning. Their power was demonstrated by
innovations in gaming, picture recognition, and natural language processing.
How does Neural Networks work?
Let’s understand with an example of how a neural network works:
1. Consider a neural network for email classification. The input layer takes features like email
content, sender information, and subject.
2. These inputs, multiplied by adjusted weights, pass through hidden layers. The network,
through training, learns to recognize patterns indicating whether an email is spam or not.
The output layer, with a binary activation function, predicts whether the email is spam (1)
or not (0). As the network iteratively refines its weights through backpropagation, it
becomes adept at distinguishing between spam and legitimate emails, showcasing the
practicality of neural networks in real-world applications like email filtering.
Working of a Neural Network
1. Neural networks are complex systems that mimic some features of the functioning of the
human brain.
2. It is composed of an input layer, one or more hidden layers, and an output layer made up of
layers of artificial neurons that are coupled.
3. The two stages of the basic process are called backpropagation and forward propagation.
Dr [Link] 2|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Figure 1: Neural Network architecture with various components
Forward Propagation
1. Input Layer: Each feature in the input layer is represented by a node on the network, which
receives input data.
2. Weights and Connections: The weight of each neuronal connection indicates how strong
the connection is. Throughout training, these weights are changed.
3. Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by weights,
adding them up, and then passing them through an activation function. By doing this, non-
linearity is introduced, enabling the network to recognize intricate patterns.
4. Output: The final result is produced by repeating the process until the output layer is
reached.
Backpropagation
1. Loss Calculation: The network’s output is evaluated against the real goal values, and a loss
function is used to compute the difference. For a regression problem, the Mean Squared
Error (MSE) is commonly used as the cost function.
Loss Function:
2. Gradient Descent: Gradient descent is then used by the network to reduce the loss. To
lower the inaccuracy, weights are changed based on the derivative of the loss with respect
to each weight.
3. Adjusting weights: The weights are adjusted at each connection by applying this iterative
process, or backpropagation, backward across the network.
4. Training: During training with different data samples, the entire process of forward
propagation, loss calculation, and backpropagation is done iteratively, enabling the network
to adapt and learn patterns from the data.
5. Activation Functions: Model non-linearity is introduced by activation functions like
the rectified linear unit (ReLU) or sigmoid. Their decision on whether to “fire” a neuron is
based on the whole weighted input.
Dr [Link] 3|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 4|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
The structure of a neural network is disparate from the structure of microprocessors
therefore required to be emulated.
It needed high processing time for big neural networks.
Applications of Artificial Neural Network
[Link] Classification and Categorization
This is vital part of several applications like web searching, information filtering, language
identification, readability assessment, as well as sentiment analysis. Artificial neural networks are
widely used for these tasks.
2. Named Entity Recognition (NER)
Named entity recognition involves focuses on categorizing named entities predefined classes like
persons, organizations, locations, dates, times, etc. The most effective and powerful named entity
recognition systems make use of artificial neural networks.
3. Part-of-Speech Tagging
Part-of-speech tagging is used for parsing, text-to-speech conversion, information extraction and
many other applications. The process is about tagging words as adjectives, verbs, nouns, adverbs,
pronouns, etc.
4. Machine Translation
Machine translation is widely used around the world, however, it still has certain limitations and
there are certain domains in which the quality of the translations is rather substandard. To improve
the quality of machine translations, researchers are attempting to use neural networks.
5. Semantic Parsing and Question Answering
Such systems automate the answering of various types of questions (this includes definition
questions, biographical questions, multilingual questions, and many other kinds of questions) that
are asked to the system in natural language.
Using artificial neural networks, it is possible to create high-performance question answering
systems.
6. Paraphrase Detection
This essentially involves figuring out whether two sentences mean the same thing. This is particularly
important in question answering systems because there are several ways in which your users could
ask the very same question.
7. Speech Recognition
Artificial neural networks are used rather extensively in speech recognition. It involves making use
of natural language processing to convert voice data into a machine-readable format.
8. Language Generation & Multi-document Summarization
Dr [Link] 5|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Natural language generation (NLG) can be used for various reports. Some of them include writing
reports, generating texts based on the data that the system analyzed, drafting summaries of
electronic medical records and generating textual weather forecasts based on weather data.
9. Character Recognition
Character recognition is applied to receipts, invoices, cheques, legal documents, etc. By using artificial
neural networks, character recognition can even be performed on hand-written characters with an
accuracy of around 85%.
10. Spell Checking
This is widely used in text editors to inform users if their text contains spelling errors. Several spell-
checking tools now make use of artificial neural networks.
Neural Network Limitations
1. Data Requirements
Large Data Needs: ANNs require vast amounts of labelled data to train effectively, which can
be difficult and expensive to obtain.
Data Quality: The quality of the data is crucial. Poor-quality data, including noisy, incomplete,
or biased datasets, can lead to inaccurate and unreliable models.
2. Computational Complexity
High Resource Consumption: Training large ANNs, especially deep networks, requires
significant computational power and time. This often necessitates access to specialized
hardware like GPUs or TPUs.
Training Time: The process of training can be very time-consuming, particularly for deep
networks with large datasets.
3. Overfitting
Susceptibility to Overfitting: ANNs can easily overfit to the training data, especially when
the model is complex and the dataset is not sufficiently large or diverse.
Complex Regularization Needs: Preventing overfitting requires careful application of
regularization techniques and validation strategies, which can be complex and time-
consuming.
4. Interpretability and Transparency
Black Box Nature: ANNs are often considered "black boxes" due to their complex internal
workings, making it difficult to understand how they make decisions.
Lack of Explain ability: Despite advancements in explainable AI, fully interpreting the
decision-making process of ANNs remains challenging.
Dr [Link] 6|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
5. Hyper parameter Sensitivity
Difficult Hyper Parameter Tuning: ANNs have many hyper parameters that need careful
tuning (e.g., learning rate, number of layers, neurons per layer). Finding the optimal
configuration often requires extensive experimentation.
Initialization Issues: The initial weights of an ANN can significantly affect the training
process and final performance, making proper initialization crucial.
6. Scalability Issues
Memory and Computational Limits: Scaling ANNs to handle very large datasets or model
sizes can be challenging due to memory and computational constraints.
Deployment Complexity: Large models can be difficult to deploy efficiently in real-time
applications, requiring significant resources and optimization.
7. Expertise Required
Need for Specialized Knowledge: Designing, training, and tuning ANNs often requires deep
expertise in machine learning and neural network architecture, which can limit accessibility
for non-experts.
8. Adversarial Vulnerability
Susceptibility to Adversarial Attacks: ANNs can be vulnerable to adversarial attacks, where
small, carefully crafted changes to input data can cause the network to make significant errors.
9. Hardware Dependence
Specialized Hardware Needs: Effective training and inference of ANNs often require access
to specialized hardware, such as GPUs or TPUs, which may not be available to all users.
10. Ethical and Bias Concerns
Propagation of Biases: ANNs can learn and amplify biases present in the training data,
leading to unfair or discriminatory outcomes.
11. Ethical Considerations: The use of ANNs in sensitive applications raises ethical questions
about privacy, fairness, and accountability.
Understanding the Biological Neuron
Biological neurons are the fundamental units of the brain and nervous system. They are specialized
cells that transmit information through electrical and chemical signals. Understanding the structure
and function of biological neurons helps to appreciate how artificial neural networks (ANNs) are
inspired by biological processes.
Dr [Link] 7|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 8|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
o When the action potential reaches the axon terminals, it triggers the release of
neurotransmitters into the synapse. These chemicals cross the synaptic gap and bind
to receptors on the dendrites of the adjacent neuron, continuing the transmission of
the signal.
Comparison with Artificial Neural Networks
Artificial Neural Networks (ANNs) are computational models inspired by the structure and function
of biological neurons. Here’s how they compare:
1. Neurons and Perceptrons:
o In ANNs, artificial neurons (or perceptrons) are simplified models that take multiple
inputs, apply weights, sum them, and pass the result through an activation function to
produce an output, analogous to the signal processing in biological neurons.
2. Synapses and Weights:
o The connections between artificial neurons have weights that are adjusted during
training, similar to how synapses strengthen or weaken in biological neurons based on
activity.
3. Layers and Networks:
o ANNs are composed of layers of neurons: input layer (receives data), hidden layers
(processes data), and output layer (produces results), mimicking the complex
networks of interconnected neurons in the brain.
4. Learning and Flexibility:
o The process of training an ANN, involving adjustments to weights based on error
minimization (e.g., backpropagation), is analogous to synaptic plasticity, where the
strength of synapses changes based on experience and learning.
Dr [Link] 9|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 10 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 11 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
2. Tanh (Hyperbolic Tangent): Similar to sigmoid but outputs range from -1 to 1, often used in
hidden layers.
Advantages:
Computationally efficient, simple implementation.
Reduces likelihood of vanishing gradient problem.
Speeds up the convergence of stochastic gradient descent.
Disadvantages:
Dying ReLU problem: neurons can become inactive and only output zero.
Limitations:
Sensitive to high learning rates which can cause neurons to die.
Real-Time Applications:
Image classification.
Convolutional neural networks (CNNs).
Utilization:
Widely used in hidden layers of deep neural networks due to efficiency and effectiveness.
[Link] (Rectified Linear Unit): Simplest and most popular, outputs the input directly if it's
positive, otherwise outputs 0.
Advantages:
Addresses the dying ReLU problem by allowing a small gradient when x≤0x ,0x≤0.
Disadvantages:
Introduces a slight complexity in the computation.
Limitations:
The choice of alpha α can significantly affect performance.
Real-Time Applications:
Object detection.
Various deep learning models needing robust activation functions.
Utilization:
Used in hidden layers where avoiding zero gradients is crucial.
Dr [Link] 12 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Limitations:
More computationally intensive due to additional parameters.
Real-Time Applications:
Image and video recognition tasks.
Deep CNNs.
Utilization:
Applied in deeper networks where learning parameters dynamically can be beneficial.
[Link] ReLU: A variant of ReLU that avoids the "dying ReLU" problem by allowing a small
positive gradient for negative inputs. Leaky Rectified Linear Unit (Leaky ReLU) is an activation
function designed to address the "dying ReLU" problem, where neurons become inactive and only
output zero for any input. Leaky ReLU introduces a small slope for negative input values, allowing
a small, non-zero gradient when the input is less than zero.
Advantages of Leaky ReLU
1. Mitigates Dying ReLU Problem:
o Unlike ReLU, which outputs zero for all negative inputs, Leaky ReLU allows a small
gradient to pass through, ensuring that neurons can continue to learn even when
their weights produce negative values.
Dr [Link] 13 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 14 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
o Although less common than in vision tasks, Leaky ReLU can be used in NLP models,
especially in feedforward layers of sequence-to-sequence models and transformers,
to enhance gradient flow and mitigate the dying ReLU problem.
o
7. Softmax: Used in the output layer for multi-class classification problems. Outputs a probability
distribution for multiple categories.
Advantages:
Converts logits into probabilities.
Suitable for multi-class classification.
Disadvantages:
Not suitable for hidden layers due to computational complexity.
Limitations:
Only useful in the output layer.
Dr [Link] 15 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Real-Time Applications:
Multi-class classification problems.
Neural networks for language models.
Utilization:
Typically used in the output layer of neural networks for classification tasks.
Choosing the right activation function, the best activation function for your neural network
depends on the specific problem you're trying to solve. Here are some general guidelines:
a. Classification (binary): Sigmoid or tanh
b. Classification (multi-class): Softmax
c. Regression: Linear or ReLU
Real-Time Application Examples
1. Image Recognition: ReLU and its variants (Leaky ReLU, PReLU) are widely used in CNNs
for tasks like object detection and facial recognition.
2. Speech Recognition: Tanh and ReLU are commonly used in RNNs and LSTMs for
processing sequential data like audio signals.
3. Natural Language Processing (NLP): Softmax is used in the output layer for language
models and text classification tasks.
Dr [Link] 16 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 17 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
o Significance: Demonstrated the potential for multi-layer networks and introduced
new training techniques.
5. Hopfield Network (1982)
Creator: John Hopfield.
Description: A recurrent neural network where each neuron is connected to every other
neuron. It is used to store and recall patterns.
Structure: Symmetric weight connections and a binary threshold activation function.
Significance: Showed how neural networks could be used for associative memory and
optimization problems.
6. Multilayer Perceptron (MLP) and Backpropagation (1986)
Pioneers: David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams.
Description: An MLP is a neural network with one or more hidden layers and non-linear
activation functions (e.g., sigmoid, tanh).
Training: Uses the backpropagation algorithm to adjust weights by minimizing the error
through gradient descent.
Significance: Enabled the training of deep neural networks and solved complex problems like
XOR that single-layer perceptron’s could not.
Advantages and Disadvantages of Early Implementations
Advantages
1. Foundational Work: Established the basic principles and mechanisms of neural networks.
2. Demonstrated Learning: Showed that neural networks could learn from data and perform
various tasks.
3. Practical Applications: Early models were used in simple practical applications, such as
pattern recognition and control systems.
Disadvantages
1. Limited Complexity: Early models, especially single-layer networks, struggled with complex
tasks and non-linear separable problems.
2. Training Challenges: Efficient training methods for deep networks were not available until
the development of backpropagation.
3. Computational Constraints: Early implementations were limited by the computational
power of the time, restricting their scalability and practicality.
Dr [Link] 18 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 19 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
There are several different architectures for ANNs, each with their own strengths and weaknesses.
Some of the most common architectures include:
1. Feedforward Neural Networks: This is the simplest type of ANN architecture, where the
information flows in one direction from input to output. The layers are fully connected,
meaning each neuron in a layer is connected to all the neurons in the next layer.
2. Recurrent Neural Networks (RNNs): These networks have a “memory” component, where
information can flow in cycles through the network. This allows the network to process
sequences of data, such as time series or speech.
3. Convolutional Neural Networks (CNNs): These networks are designed to process data
with a grid-like topology, such as images. The layers consist of convolutional layers, which
learn to detect specific features in the data, and pooling layers, which reduce the spatial
dimensions of the data.
4. Auto encoders: These are neural networks that are used for unsupervised learning. They
consist of an encoder that maps the input data to a lower-dimensional representation and a
decoder that maps the representation back to the original data.
5. Generative Adversarial Networks (GANs): These are neural networks that are used for
generative modelling. They consist of two parts: a generator that learns to generate new
data samples, and a discriminator that learns to distinguish between real and generated
data.
Dr [Link] 20 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 21 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 22 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Dr [Link] 23 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Important Questions
2 Marks Questions
1. Define a neural network
2. What is feedforward neural network.
3. What is weights and bias in an artificial neuron.
4. What is the purpose of an activation function in an artificial neural network?
5. What is single layer perceptron?
6. List any four activation functions
7. What is feed forward network?
8. What are the main components of a biological neuron?
9. What is the ReLU (Rectified Linear Unit) activation function?
10. What is the advantage of using ReLU over Sigmoid?
11. What is a multi-layer perceptron (MLP)?
5 Marks Questions
1. Explain the basic concept of artificial neural networks and their importance in machine learning.
2. What are the activation functions? Explain in detail
3. Define what is Artificial Neural Network? Explain its procedure with example
4. Define a neural network and explain its significance in the field of artificial intelligence.
5. Explain the concept of the input function and output function in an artificial neuron.
6. What are the main differences between biological and artificial neural networks?
7. Describe the structure of an artificial neuron, including its components such as input weights, bias, and
activation function.
8. Compare sigmoid, ReLU, and Tanh activation functions, highlighting their differences, advantages, and
disadvantages.
9. Compare and contrast biological neurons with artificial neurons in terms of structure and function.
Dr [Link] 24 | P a g e