Deep Learning
Deep Learning
Data Input
Data flows into the network.
Hidden Layers
Data gets transformed.
Output
Prediction is made.
What are Activation
Functions?
Activation functions decide whether a neuron should be activated.
3 Data Transformation
Maps input to output.
Why Activation Functions Matter
Without activation functions, neural networks would essentially be linear regression models. This severely limits their
ability to learn and model complex data patterns.
Activation functions enable networks to approximate any continuous function, allowing them to learn intricate patterns
and relationships within data that linear models simply cannot capture. They introduce the necessary non-linearity for
deep learning.
Activation functions play a critical role in extracting key Activation functions enable neural networks to solve non-
data features. They allow neurons to selectively respond linear problems. Many real-world problems are inherently
to specific input patterns, enabling the network to identify non-linear, and without activation functions, neural
and learn complex features. networks would be unable to model these complexities
effectively.
Types of Activation Functions
Various activation functions exist, each with unique characteristics. These functions play a critical role in determining the output of a neuron and
influencing the network's ability to learn complex patterns.
Common types: Sigmoid, Tanh, ReLU, and Softmax. Each of these activation functions has its own strengths and weaknesses, making them
suitable for different types of neural network architectures and tasks.
ReLU Softmax
Outputs the input directly if it is positive, otherwise, it outputs Converts a vector of numbers into a vector of probabilities, where
zero. It helps to alleviate the vanishing gradient problem and is the probabilities of each value are proportional to the relative scale
computationally efficient. of each value in the vector. It is commonly used in the final layer of
a classification neural network.
Sigmoid Activation Function
The Sigmoid activation function is a classic choice in neural networks, outputting values between 0 and 1. This makes it
particularly useful in scenarios where probabilities are required, such as in binary classification problems.
However, the Sigmoid function is prone to vanishing gradients, especially when dealing with very high or very low input
values. This limitation restricts its effectiveness in deep networks, where gradients need to propagate through multiple
layers.
While Tanh addresses some of the vanishing gradient issues present in Sigmoid, it's not entirely immune, particularly with
very deep networks or extreme input values.
Formula
Output values are bounded between - The zero-centered output can lead to Like Sigmoid, Tanh can still suffer
1 and 1, providing a clear range for faster convergence during training from vanishing gradients, especially
activation. compared to Sigmoid. in deep networks.
ReLU and Leaky ReLU
ReLU outputs the input directly if positive, otherwise zero.
1 ReLU
Simple and fast.
2 Dying ReLU
Can stop learning.
3 Leaky ReLU
Avoids the dying problem.
Softmax Activation Function
Softmax converts raw scores into a probability distribution.
2
Multi-Class
Probability Distribution 1
Output Layer
3
Comparison and Conclusion
Each activation function has its strengths and weaknesses.