0% found this document useful (0 votes)
45 views24 pages

Unit 5

The document provides an overview of neural networks, covering their basic structure, evolution, working principles, types, advantages, disadvantages, applications, limitations, and comparisons with biological neurons. It details how neural networks process data through layers of interconnected nodes, utilizing techniques like forward propagation and backpropagation for learning. Additionally, it discusses the challenges faced in training neural networks, including data requirements, computational complexity, and ethical concerns.

Uploaded by

achilles2006ad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views24 pages

Unit 5

The document provides an overview of neural networks, covering their basic structure, evolution, working principles, types, advantages, disadvantages, applications, limitations, and comparisons with biological neurons. It details how neural networks process data through layers of interconnected nodes, utilizing techniques like forward propagation and backpropagation for learning. Additionally, it discusses the challenges faced in training neural networks, including data requirements, computational complexity, and ethical concerns.

Uploaded by

achilles2006ad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MACHINE LEARNING COURSE CODE-A8703 MODULE-05

SYLLABUS: Basics of Neural Network: Introduction, Understanding the Biological


Neuron, Exploring the Artificial Neuron, Types of Activation Functions, Early
Implementations of ANN, Architectures of Neural Network.

Introduction to Neural Network:


 Neural Network extract identifying features from data, lacking pre-programmed
understanding. Network components include neurons, connections, weights, biases,
propagation functions, and a learning rule. Neurons receive inputs, governed by thresholds
and activation functions. Connections involve weights and biases regulating information
transfer. Learning, adjusting weights and biases, occurs in three stages: input computation,
output generation, and iterative refinement enhancing the network’s proficiency in diverse
tasks.
 Neural Networks are computational models that mimic the complex functions of the human
brain. The neural networks consist of interconnected nodes or neurons that process and
learn from data, enabling tasks such as pattern recognition and decision making in machine
learning. The article explores more about neural networks, their working, architecture and
more.
Evolution of Neural Networks
Since the 1940s, there have been a number of noteworthy advancements in the field of neural
networks:
1. 1940s-1950s: Early Concepts
Neural networks began with the introduction of the first mathematical model of artificial
neurons by McCulloch and Pitts. But computational constraints made progress difficult.
2. 1960s-1970s: Perceptron’s
This era is defined by the work of Rosenblatt on perceptron’s. Perceptron’s are single-layer
networks whose applicability was limited to issues that could be solved linearly separately.
3. 1980s: Backpropagation and Connectionism
Multi-layer network training was made possible by Rumelhart, Hinton, and Williams’
invention of the backpropagation method. With its emphasis on learning through
interconnected nodes, connectionism gained appeal.
4. 1990s: Boom and Winter With applications in image identification, finance, and other
fields, neural networks saw a boom. Neural network research did, however, experience a
“winter” due to exorbitant computational costs and inflated expectations.

Dr [Link] 1|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
5. 2000s: Resurgence and Deep Learning
Larger datasets, innovative structures, and enhanced processing capability spurred a
comeback. Deep learning has shown amazing effectiveness in a number of disciplines by
utilizing numerous layers.
6. 2010s-Present: Deep Learning Dominance
Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), two deep
learning architectures, dominated machine learning. Their power was demonstrated by
innovations in gaming, picture recognition, and natural language processing.
How does Neural Networks work?
Let’s understand with an example of how a neural network works:
1. Consider a neural network for email classification. The input layer takes features like email
content, sender information, and subject.
2. These inputs, multiplied by adjusted weights, pass through hidden layers. The network,
through training, learns to recognize patterns indicating whether an email is spam or not.
The output layer, with a binary activation function, predicts whether the email is spam (1)
or not (0). As the network iteratively refines its weights through backpropagation, it
becomes adept at distinguishing between spam and legitimate emails, showcasing the
practicality of neural networks in real-world applications like email filtering.
Working of a Neural Network
1. Neural networks are complex systems that mimic some features of the functioning of the
human brain.
2. It is composed of an input layer, one or more hidden layers, and an output layer made up of
layers of artificial neurons that are coupled.
3. The two stages of the basic process are called backpropagation and forward propagation.

Dr [Link] 2|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Figure 1: Neural Network architecture with various components
Forward Propagation
1. Input Layer: Each feature in the input layer is represented by a node on the network, which
receives input data.
2. Weights and Connections: The weight of each neuronal connection indicates how strong
the connection is. Throughout training, these weights are changed.
3. Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by weights,
adding them up, and then passing them through an activation function. By doing this, non-
linearity is introduced, enabling the network to recognize intricate patterns.
4. Output: The final result is produced by repeating the process until the output layer is
reached.

Backpropagation
1. Loss Calculation: The network’s output is evaluated against the real goal values, and a loss
function is used to compute the difference. For a regression problem, the Mean Squared
Error (MSE) is commonly used as the cost function.
Loss Function:

2. Gradient Descent: Gradient descent is then used by the network to reduce the loss. To
lower the inaccuracy, weights are changed based on the derivative of the loss with respect
to each weight.
3. Adjusting weights: The weights are adjusted at each connection by applying this iterative
process, or backpropagation, backward across the network.
4. Training: During training with different data samples, the entire process of forward
propagation, loss calculation, and backpropagation is done iteratively, enabling the network
to adapt and learn patterns from the data.
5. Activation Functions: Model non-linearity is introduced by activation functions like
the rectified linear unit (ReLU) or sigmoid. Their decision on whether to “fire” a neuron is
based on the whole weighted input.

Dr [Link] 3|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Types of Neural Networks


There are seven types of neural networks that can be used.
1. Feedforward Networks: A feedforward neural network is a simple artificial neural
network architecture in which data moves from input to output in a single direction. It has
input, hidden, and output layers; feedback loops are absent. Its straightforward architecture
makes it appropriate for a number of applications, such as regression and pattern
recognition.
2. Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with three or
more layers, including an input layer, one or more hidden layers, and an output layer. It uses
nonlinear activation functions.
3. Convolutional Neural Network (CNN): A Convolutional Neural Network (CNN) is a
specialized artificial neural network designed for image processing. It employs
convolutional layers to automatically learn hierarchical features from input images,
enabling effective image recognition and classification. CNNs have revolutionized computer
vision and are pivotal in tasks like object detection and image analysis.
4. Recurrent Neural Network (RNN): An artificial neural network type intended for
sequential data processing is called a Recurrent Neural Network (RNN). It is appropriate for
applications where contextual dependencies are critical, such as time series prediction and
natural language processing, since it makes use of feedback loops, which enable information
to survive within the network.
5. Long Short-Term Memory (LSTM): LSTM is a type of RNN that is designed to overcome
the vanishing gradient problem in training RNNs. It uses memory cells and gates to
selectively read, write, and erase information.
Advantages of Artificial Neural Network
The advantages of the neural network are as follows −
 A neural network can implement tasks that a linear program cannot.
 When an item of the neural network declines, it can continue without some issues by its
parallel features.
 A neural network determines and does not require to be reprogrammed.
 It can be executed in any application.
Disadvantages of Artificial Neural Network
The disadvantages of the neural network are as follows −
 The neural network required training to operate.

Dr [Link] 4|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
 The structure of a neural network is disparate from the structure of microprocessors
therefore required to be emulated.
 It needed high processing time for big neural networks.
Applications of Artificial Neural Network
[Link] Classification and Categorization
This is vital part of several applications like web searching, information filtering, language
identification, readability assessment, as well as sentiment analysis. Artificial neural networks are
widely used for these tasks.
2. Named Entity Recognition (NER)
Named entity recognition involves focuses on categorizing named entities predefined classes like
persons, organizations, locations, dates, times, etc. The most effective and powerful named entity
recognition systems make use of artificial neural networks.
3. Part-of-Speech Tagging
Part-of-speech tagging is used for parsing, text-to-speech conversion, information extraction and
many other applications. The process is about tagging words as adjectives, verbs, nouns, adverbs,
pronouns, etc.
4. Machine Translation
Machine translation is widely used around the world, however, it still has certain limitations and
there are certain domains in which the quality of the translations is rather substandard. To improve
the quality of machine translations, researchers are attempting to use neural networks.
5. Semantic Parsing and Question Answering
Such systems automate the answering of various types of questions (this includes definition
questions, biographical questions, multilingual questions, and many other kinds of questions) that
are asked to the system in natural language.
Using artificial neural networks, it is possible to create high-performance question answering
systems.
6. Paraphrase Detection
This essentially involves figuring out whether two sentences mean the same thing. This is particularly
important in question answering systems because there are several ways in which your users could
ask the very same question.
7. Speech Recognition
Artificial neural networks are used rather extensively in speech recognition. It involves making use
of natural language processing to convert voice data into a machine-readable format.
8. Language Generation & Multi-document Summarization
Dr [Link] 5|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
Natural language generation (NLG) can be used for various reports. Some of them include writing
reports, generating texts based on the data that the system analyzed, drafting summaries of
electronic medical records and generating textual weather forecasts based on weather data.
9. Character Recognition
Character recognition is applied to receipts, invoices, cheques, legal documents, etc. By using artificial
neural networks, character recognition can even be performed on hand-written characters with an
accuracy of around 85%.
10. Spell Checking
This is widely used in text editors to inform users if their text contains spelling errors. Several spell-
checking tools now make use of artificial neural networks.
Neural Network Limitations
1. Data Requirements
Large Data Needs: ANNs require vast amounts of labelled data to train effectively, which can
be difficult and expensive to obtain.
Data Quality: The quality of the data is crucial. Poor-quality data, including noisy, incomplete,
or biased datasets, can lead to inaccurate and unreliable models.
2. Computational Complexity
High Resource Consumption: Training large ANNs, especially deep networks, requires
significant computational power and time. This often necessitates access to specialized
hardware like GPUs or TPUs.
Training Time: The process of training can be very time-consuming, particularly for deep
networks with large datasets.
3. Overfitting
Susceptibility to Overfitting: ANNs can easily overfit to the training data, especially when
the model is complex and the dataset is not sufficiently large or diverse.
Complex Regularization Needs: Preventing overfitting requires careful application of
regularization techniques and validation strategies, which can be complex and time-
consuming.
4. Interpretability and Transparency
Black Box Nature: ANNs are often considered "black boxes" due to their complex internal
workings, making it difficult to understand how they make decisions.
Lack of Explain ability: Despite advancements in explainable AI, fully interpreting the
decision-making process of ANNs remains challenging.

Dr [Link] 6|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
5. Hyper parameter Sensitivity
Difficult Hyper Parameter Tuning: ANNs have many hyper parameters that need careful
tuning (e.g., learning rate, number of layers, neurons per layer). Finding the optimal
configuration often requires extensive experimentation.
Initialization Issues: The initial weights of an ANN can significantly affect the training
process and final performance, making proper initialization crucial.
6. Scalability Issues
Memory and Computational Limits: Scaling ANNs to handle very large datasets or model
sizes can be challenging due to memory and computational constraints.
Deployment Complexity: Large models can be difficult to deploy efficiently in real-time
applications, requiring significant resources and optimization.
7. Expertise Required
Need for Specialized Knowledge: Designing, training, and tuning ANNs often requires deep
expertise in machine learning and neural network architecture, which can limit accessibility
for non-experts.
8. Adversarial Vulnerability
Susceptibility to Adversarial Attacks: ANNs can be vulnerable to adversarial attacks, where
small, carefully crafted changes to input data can cause the network to make significant errors.
9. Hardware Dependence
Specialized Hardware Needs: Effective training and inference of ANNs often require access
to specialized hardware, such as GPUs or TPUs, which may not be available to all users.
10. Ethical and Bias Concerns
Propagation of Biases: ANNs can learn and amplify biases present in the training data,
leading to unfair or discriminatory outcomes.
11. Ethical Considerations: The use of ANNs in sensitive applications raises ethical questions
about privacy, fairness, and accountability.
Understanding the Biological Neuron
Biological neurons are the fundamental units of the brain and nervous system. They are specialized
cells that transmit information through electrical and chemical signals. Understanding the structure
and function of biological neurons helps to appreciate how artificial neural networks (ANNs) are
inspired by biological processes.

Dr [Link] 7|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Structure of a Biological Neuron


1. Cell Body (Soma):
o The cell body contains the nucleus and is responsible for maintaining the cell's health.
It integrates incoming signals and generates outgoing signals to the axon.
2. Dendrites:
o Dendrites are tree-like structures that receive messages from other neurons. They act
as the input regions of the neuron, collecting electrical signals from the synapses.
3. Axon:
o The axon is a long, thin fiber that transmits electrical impulses away from the cell body
to other neurons, muscles, or glands. It can vary greatly in length, extending from a few
millimeters to over a meter.
4. Myelin Sheath:
o Many axons are covered with a myelin sheath, a fatty layer that insulates the axon and
speeds up the transmission of electrical signals.
5. Nodes of Ranvier:
o These are gaps in the myelin sheath along the axon where the action potential is
regenerated, allowing for faster signal transmission.
6. Axon Terminals (Synaptic Boutons):
o The axon terminals are the endpoints of the axon that release neurotransmitters into
the synapse (the gap between neurons) to transmit signals to other neurons.
Function of a Biological Neuron
1. Signal Reception:
o Neurons receive signals from other neurons through the dendrites. These signals are
typically in the form of neurotransmitters, which bind to receptors on the dendritic
membrane.
2. Signal Integration:
o The cell body integrates incoming signals. If the combined signal strength reaches a
certain threshold, the neuron generates an action potential.
3. Signal Transmission:
o The action potential travels down the axon, jumping from one Node of Ranvier to the
next, in a process called saltatory conduction, which speeds up signal transmission.
4. Signal Propagation:

Dr [Link] 8|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
o When the action potential reaches the axon terminals, it triggers the release of
neurotransmitters into the synapse. These chemicals cross the synaptic gap and bind
to receptors on the dendrites of the adjacent neuron, continuing the transmission of
the signal.
Comparison with Artificial Neural Networks
Artificial Neural Networks (ANNs) are computational models inspired by the structure and function
of biological neurons. Here’s how they compare:
1. Neurons and Perceptrons:
o In ANNs, artificial neurons (or perceptrons) are simplified models that take multiple
inputs, apply weights, sum them, and pass the result through an activation function to
produce an output, analogous to the signal processing in biological neurons.
2. Synapses and Weights:
o The connections between artificial neurons have weights that are adjusted during
training, similar to how synapses strengthen or weaken in biological neurons based on
activity.
3. Layers and Networks:
o ANNs are composed of layers of neurons: input layer (receives data), hidden layers
(processes data), and output layer (produces results), mimicking the complex
networks of interconnected neurons in the brain.
4. Learning and Flexibility:
o The process of training an ANN, involving adjustments to weights based on error
minimization (e.g., backpropagation), is analogous to synaptic plasticity, where the
strength of synapses changes based on experience and learning.

Dr [Link] 9|Page
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Exploring the Artificial Neuron:


Artificial neurons, also known as perceptrons, are the fundamental units of artificial neural networks
(ANNs). They are inspired by the structure and function of biological neurons but are much simpler.
Understanding the artificial neuron helps to grasp how ANNs function as a whole.
Components of an Artificial Neuron:
1. Inputs (X): These are the features or signals fed into the neuron. In a dataset, these could
represent various attributes of the data points.
2. Weights (W): These are learnable parameters that the neuron adjusts during the training
process. The goal is to find the optimal weights that minimize the error in predictions.
3. Bias (b): This term allows the model to fit the data better by shifting the activation function.
It helps in improving the model's flexibility.
4. Weighted Sum (Z): This is the intermediate value calculated by combining the inputs and
weights. It serves as the input to the activation function.
5. Activation Function (f): This function determines the output of the neuron based on the
weighted sum. It adds non-linearity to the model, enabling it to learn complex patterns.
6. Output (Y): This is the final value produced by the neuron, which can be an input to another
neuron in the next layer or the final prediction in a single-layer network.
7. Input Collection: The neuron collects inputs from either the initial data or the outputs of
neurons from previous layers.
Role of Artificial Neurons in Neural Networks
1. Input Layer:
o The input layer consists of neurons that receive the initial data inputs.
2. Hidden Layers:
o Hidden layers contain neurons that process inputs from the previous layer. The
multiple layers allow the network to learn hierarchical representations of the data.
3. Output Layer:
o The output layer produces the final prediction or classification. The number of neurons
in the output layer corresponds to the number of classes or regression targets.

Dr [Link] 10 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Types of Activation functions


Activation functions
Activation functions are a crucial part of artificial neurons in neural networks. They play a key role
in introducing non-linearity, which allows the network to learn and model complex patterns that
wouldn't be possible with linear relationships alone.
Why are they important?
1. Non-linearity: Linear models can only learn basic relationships between inputs and outputs.
Activation functions introduce non-linearity, allowing the network to model more complex
relationships like curves, thresholds, and decision boundaries.
2. Learning complex patterns: With non-linearity, neural networks can learn intricate patterns
in data that wouldn't be possible with just linear activation.
3. Gradient descent: Activation functions with smooth gradients are essential for the
backpropagation algorithm used in training neural networks. Backpropagation relies on
calculating gradients to adjust the weights in the network and optimize its performance.

Different Types of Activation Functions


1. Sigmoid: Outputs a value between 0 and 1, often used for binary classification problems.
(Think of it squashing the input between 0 and 1)
Advantages:
 Smooth gradient, preventing abrupt changes in predictions.
 Output range (0, 1) makes it suitable for probabilistic interpretations.
Disadvantages:
 Vanishing gradient problem for very high/low values of xxx, making training deep
networks difficult.
 Outputs are not zero-centered, causing gradients to be consistently positive or negative.
Limitations:
 Not suitable for layers after the first hidden layer due to gradient saturation.
Real-Time Applications:
 Binary classification problems.
 Logistic regression.
Utilization: Sigmoid functions are typically used in the output layer of binary classification neural
networks.

Dr [Link] 11 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

2. Tanh (Hyperbolic Tangent): Similar to sigmoid but outputs range from -1 to 1, often used in
hidden layers.
Advantages:
 Computationally efficient, simple implementation.
 Reduces likelihood of vanishing gradient problem.
 Speeds up the convergence of stochastic gradient descent.
Disadvantages:
 Dying ReLU problem: neurons can become inactive and only output zero.
Limitations:
 Sensitive to high learning rates which can cause neurons to die.
Real-Time Applications:
 Image classification.
 Convolutional neural networks (CNNs).
Utilization:
 Widely used in hidden layers of deep neural networks due to efficiency and effectiveness.

[Link] (Rectified Linear Unit): Simplest and most popular, outputs the input directly if it's
positive, otherwise outputs 0.
Advantages:
 Addresses the dying ReLU problem by allowing a small gradient when x≤0x ,0x≤0.
Disadvantages:
 Introduces a slight complexity in the computation.
Limitations:
 The choice of alpha α can significantly affect performance.
Real-Time Applications:
 Object detection.
 Various deep learning models needing robust activation functions.
Utilization:
 Used in hidden layers where avoiding zero gradients is crucial.

[Link] ReLU (PReLU)


Advantages:
 Learns the parameters for negative values, potentially improving performance.
Disadvantages:
 Increases the number of parameters to be learned.

Dr [Link] 12 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Limitations:
 More computationally intensive due to additional parameters.
Real-Time Applications:
 Image and video recognition tasks.
 Deep CNNs.
Utilization:
 Applied in deeper networks where learning parameters dynamically can be beneficial.

5. Exponential Linear Unit (ELU)


Advantages:
 Helps to make the mean of activations closer to zero, reducing bias shifts.
 Improves learning characteristics.
Disadvantages:
 More computationally expensive compared to ReLU.
Limitations:
 Needs careful initialization of α-alphaα.
Real-Time Applications:
 Image processing.
 Advanced neural network architectures.
Utilization:
 Useful in layers where maintaining zero-centered activations is important.

[Link] ReLU: A variant of ReLU that avoids the "dying ReLU" problem by allowing a small
positive gradient for negative inputs. Leaky Rectified Linear Unit (Leaky ReLU) is an activation
function designed to address the "dying ReLU" problem, where neurons become inactive and only
output zero for any input. Leaky ReLU introduces a small slope for negative input values, allowing
a small, non-zero gradient when the input is less than zero.
Advantages of Leaky ReLU
1. Mitigates Dying ReLU Problem:
o Unlike ReLU, which outputs zero for all negative inputs, Leaky ReLU allows a small
gradient to pass through, ensuring that neurons can continue to learn even when
their weights produce negative values.

Dr [Link] 13 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

o Computationally Efficient: Similar to ReLU, Leaky ReLU is computationally simple


and efficient to compute. It involves simple thresholding which is very fast to
compute during training and inference.
2. Improved Gradient Flow:
o By allowing a small gradient for negative inputs, Leaky ReLU improves gradient flow
through the network, which can result in faster convergence and better
performance during training.
Disadvantages of Leaky ReLU
1. Fixed Negative Slope:
o The fixed negative slope α-alphaα might not be optimal for all tasks. While it allows
for gradient flow, the fixed value could limit performance in some scenarios.
2. Not Zero-Centered:
o Similar to ReLU, the outputs of Leaky ReLU are not zero-centered, which can lead to
inefficient updates of the model parameters during training.
Limitations of Leaky ReLU
1. Choice of α-alphaα:
o The choice of the parameter α-alphaα is crucial. If α-alphaα is too large, it might
result in high negative activations, which can slow down training. Conversely, if α-
alphaα is too small, it might not effectively address the dying ReLU problem.
2. Not Universally Optimal:
o Leaky ReLU may not be the best choice for all types of neural networks or tasks.
Depending on the specific problem and network architecture, other activation
functions might perform better.
Real-Time Applications of Leaky ReLU
1. Computer Vision:
o Leaky ReLU is extensively used in convolutional neural networks (CNNs) for image
classification, object detection, and segmentation tasks. It helps in maintaining a
healthy gradient flow, thereby improving training efficiency and performance.
2. Speech Recognition:
o In recurrent neural networks (RNNs) and their variants like LSTMs and GRUs used
for speech and audio processing, Leaky ReLU can help in addressing vanishing
gradient issues, leading to better learning of temporal dependencies.
3. Natural Language Processing (NLP):

Dr [Link] 14 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

o Although less common than in vision tasks, Leaky ReLU can be used in NLP models,
especially in feedforward layers of sequence-to-sequence models and transformers,
to enhance gradient flow and mitigate the dying ReLU problem.
o

Detailed Utilization of Leaky ReLU


 Convolutional Neural Networks (CNNs):
o Layers: Leaky ReLU is typically used in the hidden layers of CNNs.
o Benefits: Helps in maintaining active neurons during training, which is critical for
learning rich feature representations in tasks like image recognition.
 Recurrent Neural Networks (RNNs):
o Layers: Used in hidden layers of RNNs, LSTMs, and GRUs.
o Benefits: Mitigates vanishing gradient issues, thus preserving long-term
dependencies in sequences, crucial for tasks such as language modeling and
machine translation.
 Fully Connected Neural Networks:
o Layers: Applied in hidden layers of deep feedforward networks.
o Benefits: Ensures that neurons do not become inactive, thus maintaining a steady
gradient flow and contributing to efficient learning.
 GANs (Generative Adversarial Networks):
o Discriminator Network: Leaky ReLU is often used in the discriminator network of
GANs to ensure that gradients flow smoothly and the model trains effectively
without dead neurons.
o Benefits: Helps in stabilizing the training process and improves the generation
quality of adversarial examples.

7. Softmax: Used in the output layer for multi-class classification problems. Outputs a probability
distribution for multiple categories.
Advantages:
 Converts logits into probabilities.
 Suitable for multi-class classification.
Disadvantages:
 Not suitable for hidden layers due to computational complexity.
Limitations:
 Only useful in the output layer.

Dr [Link] 15 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Real-Time Applications:
 Multi-class classification problems.
 Neural networks for language models.

Utilization:
 Typically used in the output layer of neural networks for classification tasks.
Choosing the right activation function, the best activation function for your neural network
depends on the specific problem you're trying to solve. Here are some general guidelines:
a. Classification (binary): Sigmoid or tanh
b. Classification (multi-class): Softmax
c. Regression: Linear or ReLU
Real-Time Application Examples
1. Image Recognition: ReLU and its variants (Leaky ReLU, PReLU) are widely used in CNNs
for tasks like object detection and facial recognition.
2. Speech Recognition: Tanh and ReLU are commonly used in RNNs and LSTMs for
processing sequential data like audio signals.
3. Natural Language Processing (NLP): Softmax is used in the output layer for language
models and text classification tasks.

EQUATIONS FOR ACTIVATION FUNCTIONS

Figure 1: Activation function formulas

Dr [Link] 16 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Early Implementations of ANN


Artificial Neural Networks (ANNs) have a rich history that traces back to foundational theoretical
work and early practical implementations. These early implementations laid the groundwork for the
advanced neural networks we use today. Here are some significant early milestones:
1. McCulloch-Pitts Neuron (1943)
 Description: The McCulloch-Pitts neuron is a simplified model of a biological neuron.
 Structure: It sums its input signals and outputs a binary value (0 or 1) based on whether the
sum exceeds a certain threshold.
 Significance: Demonstrated that neural networks could, in theory, compute any computable
function by combining neurons in specific ways.
2. Hebbian Learning (1949)
 Description: A learning rule based on the idea that the connection between two neurons is
strengthened when both neurons are activated simultaneously.
 Significance: Provided the first learning mechanism for neural networks, emphasizing the
role of synaptic plasticity in learning.
3. Perceptron (1958)
 Description: The Perceptron is a type of single-layer neural network that can classify input
into one of two categories.
 Structure: Consists of input units, weights, a summation processor, and an activation function
(usually a step function).
 Training: Uses a simple learning algorithm that adjusts the weights based on the classification
error.
 Significance: One of the first practical implementations of a neural network, highlighting both
the potential and limitations of neural network models.
4. Adaline and Madaline (1960)
 Adaline (Adaptive Linear Neuron):
o Description: A single-layer neural network that uses linear activation functions.
o Training: Uses the least mean squares (LMS) algorithm to minimize the error between
the predicted and actual outputs.
 Madaline (Multiple Adaline):
o Description: An extension of Adaline, consisting of multiple Adaline units organized
in layers.

Dr [Link] 17 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05
o Significance: Demonstrated the potential for multi-layer networks and introduced
new training techniques.
5. Hopfield Network (1982)
 Creator: John Hopfield.
 Description: A recurrent neural network where each neuron is connected to every other
neuron. It is used to store and recall patterns.
 Structure: Symmetric weight connections and a binary threshold activation function.
 Significance: Showed how neural networks could be used for associative memory and
optimization problems.
6. Multilayer Perceptron (MLP) and Backpropagation (1986)
 Pioneers: David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams.
 Description: An MLP is a neural network with one or more hidden layers and non-linear
activation functions (e.g., sigmoid, tanh).
 Training: Uses the backpropagation algorithm to adjust weights by minimizing the error
through gradient descent.
 Significance: Enabled the training of deep neural networks and solved complex problems like
XOR that single-layer perceptron’s could not.
Advantages and Disadvantages of Early Implementations
Advantages
1. Foundational Work: Established the basic principles and mechanisms of neural networks.
2. Demonstrated Learning: Showed that neural networks could learn from data and perform
various tasks.
3. Practical Applications: Early models were used in simple practical applications, such as
pattern recognition and control systems.
Disadvantages
1. Limited Complexity: Early models, especially single-layer networks, struggled with complex
tasks and non-linear separable problems.
2. Training Challenges: Efficient training methods for deep networks were not available until
the development of backpropagation.
3. Computational Constraints: Early implementations were limited by the computational
power of the time, restricting their scalability and practicality.

Dr [Link] 18 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Real-Time Applications of Early ANN Models


1. Pattern Recognition:
o Perceptron: Used in early optical character recognition systems.
2. Signal Processing:
o Adaline: Applied in adaptive filtering, such as noise cancellation and echo suppression
in telecommunications.
3. Robotics:
o Madaline: Implemented in early robotic control systems to learn and adapt to
changing environments.
Utilization in Modern Contexts
1. Educational Tools:
o Early models are used to teach fundamental concepts of neural networks and machine
learning in academic settings.
2. Historical Research:
o Provides insights into the evolution of neural networks and the challenges overcome
to reach modern deep learning architectures.
3. Foundation for Advanced Models:
o Techniques and principles from early implementations continue to influence the
development of advanced neural network models and training algorithms.

Dr [Link] 19 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Architectures of Neural Network:

There are several different architectures for ANNs, each with their own strengths and weaknesses.
Some of the most common architectures include:
1. Feedforward Neural Networks: This is the simplest type of ANN architecture, where the
information flows in one direction from input to output. The layers are fully connected,
meaning each neuron in a layer is connected to all the neurons in the next layer.
2. Recurrent Neural Networks (RNNs): These networks have a “memory” component, where
information can flow in cycles through the network. This allows the network to process
sequences of data, such as time series or speech.
3. Convolutional Neural Networks (CNNs): These networks are designed to process data
with a grid-like topology, such as images. The layers consist of convolutional layers, which
learn to detect specific features in the data, and pooling layers, which reduce the spatial
dimensions of the data.
4. Auto encoders: These are neural networks that are used for unsupervised learning. They
consist of an encoder that maps the input data to a lower-dimensional representation and a
decoder that maps the representation back to the original data.
5. Generative Adversarial Networks (GANs): These are neural networks that are used for
generative modelling. They consist of two parts: a generator that learns to generate new
data samples, and a discriminator that learns to distinguish between real and generated
data.

Dr [Link] 20 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

What are neural networks used for?


Although we have been studying and implementing neural networks since at least the 1940s,
advancements in deep learning have guided us to work with the algorithms in new and advanced ways.
Today, researchers and scientists can use neural networks for real-world applications in various fields,
including the automotive industry, finance, national defense, insurance, health care, and utilities.
1. Automotive: Self-driving cars use neural networks to make decisions based on the data they
receive from their surroundings. Neural networks can also optimize vehicle parts and functions
or estimate how many vehicles you need to make to meet demand.
2. Finance: Neural networks have many uses in the finance industry, from predicting the
performance of the stock market or exchange rates between monetary denominations to
determining credit scores and default risks.
3. National defense: The Department of Defense uses neural networks to simulate situational
training, such as combat readiness. Other neural network applications in national defence
include the ability to develop unmanned aircraft.
4. Insurance: Insurance providers can use neural networks to model how often customers file
insurance claims and the size of those claims.
5. Health care: In a health care setting, doctors, health care administrators, and researchers use
neural networks to make informed decisions about patient care, organizational decisions, and
developing new medications.
6. Utilities: Utility companies can use neural networks to forecast energy demand. Other uses
include stabilizing electrical voltage or modelling oil recovery from residential areas.
Interconnections in Network:
Interconnection can be defined as the way processing elements (Neuron) in ANN are connected to
each other. Hence, the arrangements of these processing elements and geometry of
interconnections are very essential in ANN.
There exist five basic types of neuron connection architecture:
1. Single-layer feed-forward network
2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network

Dr [Link] 21 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

[Link]-layer feed-forward network:


In this type of network, we have only
two layers input layer and the output
layer but the input layer does not
count because no computation is
performed in this layer. The output
layer is formed when different
weights are applied to input nodes
and the cumulative effect per node is
taken. After this, the neurons
collectively give the output layer to
compute the output signals.

2. Multilayer Feed-Forward Network:

This layer also has a hidden


layer that is internal to the
network and has no direct
contact with the external
layer. The existence of one or
more hidden layers enables
the network to be
computationally stronger, a
feed-forward network
because of information flow
through the input function,
and the intermediate computations used to determine the output Z. There are no feedback
connections in which outputs of the model are fed back into itself.

Dr [Link] 22 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

[Link] Node with Its Own Feedback


When outputs can be directed back as inputs to the
same layer or preceding layer nodes, then it results
in feedback networks. Recurrent networks are
feedback networks with closed loops. The above
figure shows a single recurrent network having a
single neuron with feedback to itself.

4. Single-Layer Recurrent Network:


The above network is a single-layer network
with a feedback connection in which the
processing element’s output can be directed
back to itself or to another processing
element or both. A recurrent neural network
is a class of artificial neural networks where
connections between nodes form a directed
graph along a sequence. This allows it to
exhibit dynamic temporal behaviour for a time sequence. Unlike feedforward neural networks,
RNNs can use their internal state (memory) to process sequences of inputs.
5. Multilayer Recurrent Network:
In this type of network, processing element output
can be directed to the processing element in the same
layer and in the preceding layer forming a multilayer
recurrent network. They perform the same task for
every element of a sequence, with the output being
dependent on the previous computations. Inputs are
not needed at each time step. The main feature of a
Recurrent Neural Network is its hidden state, which
captures some information about a sequence.

Dr [Link] 23 | P a g e
MACHINE LEARNING COURSE CODE-A8703 MODULE-05

Important Questions
2 Marks Questions
1. Define a neural network
2. What is feedforward neural network.
3. What is weights and bias in an artificial neuron.
4. What is the purpose of an activation function in an artificial neural network?
5. What is single layer perceptron?
6. List any four activation functions
7. What is feed forward network?
8. What are the main components of a biological neuron?
9. What is the ReLU (Rectified Linear Unit) activation function?
10. What is the advantage of using ReLU over Sigmoid?
11. What is a multi-layer perceptron (MLP)?

5 Marks Questions
1. Explain the basic concept of artificial neural networks and their importance in machine learning.
2. What are the activation functions? Explain in detail
3. Define what is Artificial Neural Network? Explain its procedure with example
4. Define a neural network and explain its significance in the field of artificial intelligence.
5. Explain the concept of the input function and output function in an artificial neuron.
6. What are the main differences between biological and artificial neural networks?
7. Describe the structure of an artificial neuron, including its components such as input weights, bias, and
activation function.
8. Compare sigmoid, ReLU, and Tanh activation functions, highlighting their differences, advantages, and
disadvantages.
9. Compare and contrast biological neurons with artificial neurons in terms of structure and function.

Dr [Link] 24 | P a g e

You might also like