Deep Learning

Name: Vikiron Mondal
Stream: CSE (AI & ML)

Roll: 13030822083
Registration No: 221300110327 OF 2022-23
Year: 3rd
Semester: 6th
Subject Name: Deep Learning
Subject Code: PCCAIML602
Topic: Activation Function in Neural Networks
Introduction to Neural Networks
Neural networks are computational models inspired by the human brain. They
learn from data to make predictions.
Activation functions are crucial. They introduce non-linearity, allowing

networks to learn complex patterns.
Data Input
Data flows into the network.
Hidden Layers
Data gets transformed.
Output
Prediction is made.
What are Activation
Functions?
Activation functions decide whether a neuron should be activated.
They introduce non-linearity, enabling networks to learn complex

relationships in data.
1 Neuron Activation 2 Non-Linearity

Controls neuron output. Enables complex learning.
3 Data Transformation
Maps input to output.
Why Activation Functions Matter
Without activation functions, neural networks would essentially be linear regression models. This severely limits their
ability to learn and model complex data patterns.
Activation functions enable networks to approximate any continuous function, allowing them to learn intricate patterns
and relationships within data that linear models simply cannot capture. They introduce the necessary non-linearity for
deep learning.
Pattern Recognition Complexity Handling
Activation functions play a critical role in extracting key Activation functions enable neural networks to solve non-
data features. They allow neurons to selectively respond linear problems. Many real-world problems are inherently
to specific input patterns, enabling the network to identify non-linear, and without activation functions, neural
and learn complex features. networks would be unable to model these complexities
effectively.
Types of Activation Functions
Various activation functions exist, each with unique characteristics. These functions play a critical role in determining the output of a neuron and
influencing the network's ability to learn complex patterns.
Common types: Sigmoid, Tanh, ReLU, and Softmax. Each of these activation functions has its own strengths and weaknesses, making them
suitable for different types of neural network architectures and tasks.
Linear Sigmoid Tanh

A basic activation function that simply Outputs values between 0 and 1, making it Similar to Sigmoid but outputs values
outputs the input value. It doesn't suitable for binary classification problems. between -1 and 1. It often converges faster
introduce non-linearity, limiting the However, it can suffer from vanishing than Sigmoid but can still suffer from
network's ability to learn complex gradients. vanishing gradients.
patterns.
ReLU Softmax
Outputs the input directly if it is positive, otherwise, it outputs Converts a vector of numbers into a vector of probabilities, where
zero. It helps to alleviate the vanishing gradient problem and is the probabilities of each value are proportional to the relative scale
computationally efficient. of each value in the vector. It is commonly used in the final layer of
a classification neural network.
Sigmoid Activation Function
The Sigmoid activation function is a classic choice in neural networks, outputting values between 0 and 1. This makes it
particularly useful in scenarios where probabilities are required, such as in binary classification problems.
However, the Sigmoid function is prone to vanishing gradients, especially when dealing with very high or very low input
values. This limitation restricts its effectiveness in deep networks, where gradients need to propagate through multiple
layers.
Formula Pros Cons
σ(x) = 1 / (1 + e^(-x)) Output is easy to interpret as Vanishing gradient problems can

probabilities, making it suitable for hinder learning, especially in deep
binary classification. The function is networks. The output is not zero-
also continuous and differentiable, centered, which can slow down
which is essential for gradient-based learning. Also, the exponential
learning methods. computation can be relatively slow.
Tanh Activation Function
The Tanh (Hyperbolic Tangent) activation function is another popular choice in neural networks, similar to Sigmoid but
with an output range between -1 and 1. This makes it naturally zero-centered, which can help accelerate learning in some
cases.
While Tanh addresses some of the vanishing gradient issues present in Sigmoid, it's not entirely immune, particularly with
very deep networks or extreme input values.
Formula
tanh(x) = (e^x - e^-x) / (e^x + e^-x)
Range Pros Cons
Output values are bounded between - The zero-centered output can lead to Like Sigmoid, Tanh can still suffer
1 and 1, providing a clear range for faster convergence during training from vanishing gradients, especially
activation. compared to Sigmoid. in deep networks.
ReLU and Leaky ReLU
ReLU outputs the input directly if positive, otherwise zero.
Leaky ReLU addresses the "dying ReLU" problem by allowing a small

negative slope.
1 ReLU
Simple and fast.
2 Dying ReLU
Can stop learning.
3 Leaky ReLU
Avoids the dying problem.
Softmax Activation Function
Softmax converts raw scores into a probability distribution.
It's used in the output layer for multi-class classification problems.
2
Multi-Class
Probability Distribution 1
Output Layer
3
Comparison and Conclusion
Each activation function has its strengths and weaknesses.
Choose based on the specific task and network architecture. Experimentation

is vital.
Function Use Case Pros Cons
Sigmoid Binary Easy to Vanishing

classification interpret gradients
Tanh Hidden layers Centered Gradient issues

output
ReLU Hidden layers Fast Dying ReLU
Softmax Multi-class Probability None

output distribution

Deep Learning

Uploaded by

Deep Learning

Uploaded by

Name: Vikiron Mondal

Stream: CSE (AI & ML)

Activation functions are crucial. They introduce non-linearity, allowing

They introduce non-linearity, enabling networks to learn complex

1 Neuron Activation 2 Non-Linearity

Pattern Recognition Complexity Handling

Linear Sigmoid Tanh

Formula Pros Cons

σ(x) = 1 / (1 + e^(-x)) Output is easy to interpret as Vanishing gradient problems can

tanh(x) = (e^x - e^-x) / (e^x + e^-x)

Range Pros Cons

Leaky ReLU addresses the "dying ReLU" problem by allowing a small

It's used in the output layer for multi-class classification problems.

Choose based on the specific task and network architecture. Experimentation

Function Use Case Pros Cons

Sigmoid Binary Easy to Vanishing

Tanh Hidden layers Centered Gradient issues

ReLU Hidden layers Fast Dying ReLU

Softmax Multi-class Probability None

You might also like