0% found this document useful (0 votes)
147 views5 pages

Deep Learning Cheatsheet

This document provides a summary of key concepts in deep learning including neural networks, convolutional neural networks, recurrent neural networks, and reinforcement learning. It defines common neural network components like layers, activation functions, loss functions, and backpropagation. It also explains recurrent neural network components like LSTM and gates. Finally, it defines reinforcement learning concepts like Markov decision processes, policies, value functions, and algorithms like Q-learning.

Uploaded by

Ferda Özdemir
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
147 views5 pages

Deep Learning Cheatsheet

This document provides a summary of key concepts in deep learning including neural networks, convolutional neural networks, recurrent neural networks, and reinforcement learning. It defines common neural network components like layers, activation functions, loss functions, and backpropagation. It also explains recurrent neural network components like LSTM and gates. Finally, it defines reinforcement learning concepts like Markov decision processes, policies, value functions, and algorithms like Q-learning.

Uploaded by

Ferda Özdemir
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 5

Deep Learning cheatsheet

Star 7,013

By Afshine Amidi and Shervine Amidi

Neural Networks
Neural networks are a class of models that are built with layers. Commonly used types
of neural networks include convolutional and recurrent neural networks.

Architecture ― The vocabulary around neural networks architectures is described in the


figure below:

By noting ii the ithith layer of the network and jj the jthjth hidden unit of the layer, we
have:
z[i]j=w[i]jTx+b[i]j zj[i]=wj[i]Tx+bj[i]
where we note ww, bb, zz the weight, bias and output respectively.

Activation function ― Activation functions are used at the end of a hidden unit to
introduce non-linear complexities to the model. Here are the most common ones:

Sigmoid Tanh ReLU Leaky ReLU

g(z)=max(ϵz,z)g(
g(z)=11+e−zg( g(z)=ez−e−zez+e−zg(z) g(z)=max(0,z)g(
z)=max(ϵz,z)
z)=11+e−z =ez−e−zez+e−z z)=max(0,z)
with ϵ≪1ϵ≪1
Cross-entropy loss ― In the context of neural networks, the cross-entropy
loss L(z,y)L(z,y) is commonly used and is defined as follows:
L(z,y)=−[ylog(z)+(1−y)log(1−z)] L(z,y)=−[ylog⁡(z)+(1−y)log⁡(1−z)]

Learning rate ― The learning rate, often noted αα or sometimes ηη, indicates at which
pace the weights get updated. This can be fixed or adaptively changed. The current
most popular method is called Adam, which is a method that adapts the learning rate.

Backpropagation ― Backpropagation is a method to update the weights in the neural


network by taking into account the actual output and the desired output. The derivative
with respect to weight ww is computed using chain rule and is of the following form:
∂L(z,y)∂w=∂L(z,y)∂a×∂a∂z×∂z∂w ∂L(z,y)∂w=∂L(z,y)∂a×∂a∂z×∂z∂w
As a result, the weight is updated as follows:

w⟵w−α∂L(z,y)∂w w⟵w−α∂L(z,y)∂w

Updating weights ― In a neural network, weights are updated as follows:


- Step 1: Take a batch of training data.
- Step 2: Perform forward propagation to obtain the corresponding loss.
- Step 3: Backpropagate the loss to get the gradients.
- Step 4: Use the gradients to update the weights of the network.

Dropout ― Dropout is a technique meant at preventing overfitting the training data by


dropping out units in a neural network. In practice, neurons are either dropped with
probability pp or kept with probability 1−p.1−p.
Convolutional Neural Networks
Convolutional layer requirement ― By noting WW the input volume size, FF the size
of the convolutional layer neurons, PP the amount of zero padding, then the number of
neurons NN that fit in a given volume is such that:
N=W−F+2PS+1 N=W−F+2PS+1

Batch normalization ― It is a step of hyperparameter γ,βγ,β that normalizes the


batch {xi}{xi}. By noting μB,σ2BμB,σB2 the mean and variance of that we want to
correct to the batch, it is done as follows:
xi⟵γxi−μB√ σ2B+ϵ +β xi⟵γxi−μBσB2+ϵ+β
It is usually done after a fully connected/convolutional layer and before a non-linearity
layer and aims at allowing higher learning rates and reducing the strong dependence on
initialization.

Recurrent Neural Networks


Types of gates ― Here are the different types of gates that we encounter in a typical
recurrent neural network:

Input gate Forget gate Gate Output gate

Write to cell or Erase a cell or How much to write to How much to reveal
not? not? cell? cell?

LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids
the vanishing gradient problem by adding 'forget' gates.
Reinforcement Learning and Control
The goal of reinforcement learning is for an agent to learn how to evolve in an
environment.

Definitions
Markov decision processes ― A Markov decision process (MDP) is a 5-
tuple (S,A,{Psa},γ,R)(S,A,{Psa},γ,R) where:
- SS is the set of states
- AA is the set of actions
- {Psa}{Psa} are the state transition probabilities for s∈Ss∈S and a∈Aa∈A
- γ∈[0,1[γ∈[0,1[ is the discount factor
- R:S×A⟶RR:S×A⟶R or R:S⟶RR:S⟶R is the reward function that the algorithm
wants to maximize

Policy ― A policy ππ is a function π:S⟶Aπ:S⟶A that maps states to actions.


Remark: we say that we execute a given policy ππ if given a state ss we take the
action a=π(s)a=π(s).

Value function ― For a given policy ππ and a given state ss, we define the value
function VπVπ as follows:
Vπ(s)=E[R(s0)+γR(s1)+γ2R(s2)+...|s0=s,π] Vπ(s)=E[R(s0)+γR(s1)+γ2R(s2)+...|s0=s,π
]

Bellman equation ― The optimal Bellman equations characterizes the value


function Vπ∗Vπ∗ of the optimal policy π∗π∗:
Vπ∗(s)=R(s)+maxa∈Aγ∑s′∈SPsa(s′)Vπ∗(s′) Vπ∗(s)=R(s)+maxa∈Aγ∑s′∈SPsa(s′)Vπ∗(s′)
Remark: we note that the optimal policy π∗π∗ for a given state ss is such that:
π∗(s)=argmaxa∈A∑s′∈SPsa(s′)V∗(s′) π∗(s)=argmaxa∈A∑s′∈SPsa(s′)V∗(s′)

Value iteration algorithm ― The value iteration algorithm is in two steps:

1) We initialize the value:


V0(s)=0 V0(s)=0
2) We iterate the value based on the values before:

Vi+1(s)=R(s)+maxa∈A[∑s′∈SγPsa(s′)Vi(s′)] Vi+1(s)=R(s)+maxa∈A[∑s′∈SγPsa(s′)Vi(s′)]

Maximum likelihood estimate ― The maximum likelihood estimates for the state
transition probabilities are as follows:

Psa(s′)=#times took action a in state s and got to s′#times took action a in


state s Psa(s′)=#times took action a in state s and got to s′#times took
action a in state s

Q-learning ― QQ-learning is a model-free estimation of QQ, which is done as follows:


Q(s,a)←Q(s,a)+α[R(s,a,s′)+γmaxa′Q(s′,a′)−Q(s,a)] Q(s,a)←Q(s,a)+α[R(s,a,s′)+γmaxa
′Q(s′,a′)−Q(s,a)]

For a more detailed overview of the concepts above, check out the Deep Learning
cheatsheets!

You might also like