CS490 Advanced Topics in Computing (Deep Learning)
CS490 Advanced Topics in Computing (Deep Learning)
(Deep Learning)
03/03/2021
Activation functions
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 2
Activation functions
Two major drawbacks:
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 3
Activation functions
Two major drawbacks:
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 4
Activation functions
Two major drawbacks:
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 5
Activation functions
Two major drawbacks:
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 6
Activation functions
Two major drawbacks:
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 8
Activation functions
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 9
Activation functions
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 10
Activation functions
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 11
Activation functions
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 12
Activation functions
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 13
Activation functions
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 14
Activation functions
k=2
k=4
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 15
What neuron type should I use?”
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 16
Setting up the data and the model
17
Data Preprocessing
Mean subtraction
▪ The most common form of preprocessing
▪ It involves subtracting the mean across every individual feature in
the data
Normalization
▪ Refers to normalizing the data dimensions so that they are of
approximately the same scale
▪ Typically, it is done by dividing each dimension by its standard
deviation, once it has been zero-centered
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 18
Data Preprocessing
PCA and Whitening
▪ Data is first centered
▪ Decorrelate the data by projecting the original (but zero-centered)
data into the eigenbasis
▪ PCA is applied to reduce the dimensionality
▪ Whiten the data by taking the data in the eigenbasis and dividing
every dimension by the eigenvalue to normalize the scale
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 19
Common pitfall
▪ Instead, the mean must be computed only over the training data and
then subtracted equally from all splits (train/val/test)
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 20
Different Strategies in Practice for Images
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 21
Model Initialization
22
Weight Initialization
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 23
Weight Initialization
Do NOT perform all-zero intialization
▪ Every neuron in the network will compute the same output because
of which they will also all compute the same gradients during
backpropagation and undergo the exact same parameter updates
▪ In other words, there is no source of asymmetry between neurons
if their weights are initialized to be the same
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 24
Weight Initialization
Use small random numbers
▪ Instead, it is common to initialize the weights of the neurons to
small numbers and refer to doing so as symmetry breaking
▪ Randomly initialize neurons so that are all unique in the beginning
(typically sampled from a zero mean, unit standard deviation
gaussian), so that they compute distinct updates and integrate
themselves as diverse parts of the full network
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 25
Weight Initialization
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 26
Weight Initialization
(Assuming zero-mean)
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 27
Weight Initialization
Since
(Assuming zero-mean)
to achieve variance of w to be 1/n
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 28
Weight Initialization: Statistics
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 29
Weight Initialization: Statistics
A: Local gradients
all-zero, no learning
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 30
Weight Initialization: “Xavier” Initialization
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 31
Weight Initialization: What about ReLU?
Activations collapse to
zero again, no learning
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 32
Weight Initialization: Kaiming / MSRA Initialization
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 33
Proper initialization
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 34
Sparse Weight Initialization
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 35
Bias Initialization
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 36
Acknowledgements
CS490 – Advanced Topics in Computing (Deep Learning) Lecture 7: Activation Functions & Data Preprocessing 37