Unit 5e - Autoencoders
Unit 5e - Autoencoders
Chapter 14 from
The Deep Learning book
(Goodfellow et al)
1
Autoencoders
Autoencoders are artificial neural networks
capable of learning efficient representations of
the input data, called codings (or latent
representation), without any supervision.
These codings typically have a much lower
dimensionality than the input data, making
autoencoders useful for dimensionality reduction.
Some autoencoders are generative models: they
are capable of randomly generating new data
that looks very similar to the training data.
However, the generated images are usually fuzzy and
not entirely realistic.
2
Autoencoders
Which of the following number sequences
do you find the easiest to memorize?
40, 27, 25, 36, 81, 57, 10, 73, 19, 68
50, 25, 76, 38, 19, 58, 29, 88, 44, 22, 11, 34,
17, 52, 26, 13, 40, 20
3
Autoencoders
At first glance, it would seem that the first
sequence should be easier, since it is much
shorter.
However, if you look carefully at the second
sequence, you may notice that it follows two
simple rules:
Even numbers are followed by their half,
plus one
This is a famous sequence known as the
hailstone sequence.
4
Autoencoders
Once you notice this pattern, the second
sequence becomes much easier to memorize
than the first because you only need to memorize
the first number,
5
Autoencoders
6
Autoencoders formalized
Autoencoder consists of two parts: an encoder and a
decoder
The encoder transforms the data into a set of “factors” ,
i.e.
7
Autoencoders formalized
f g
Input Reconstruction
8
Autoencoders
A question that comes to the mind of every beginner
of autoencoder is “isn’t it just copying data?”
In practice, there are often constraints on the
encoder part to make sure that it will NOT lead to a
solution that just copies the data.
For example, in practice, it may require that the
9
Autoencoders
Undercomplete Overcomplete
Autoencoder Autoencoder
10
Autoencoders
Autoencoders may be thought of as being a special
case of feedforward networks and may be trained
with all the same techniques, typically minibatch SGD.
Unlike general feedforward networks, autoencoders
may also be trained using recirculation (Hinton and
McClelland, 1988), a learning algorithm based on
comparing the activations of the network on the
original input to the activations on the reconstructed
input.
Recirculation is regarded as more biologically
plausible than back-propagation but is rarely used for
machine learning applications.
11
Undercomplete Autoencoders
Learning an undercomplete representation forces
the autoencoder to capture the most salient
features of the training data.
When the decoder is linear and L is the mean
squared error, an undercomplete autoencoder
learns to span the same subspace as PCA.
Autoencoders with nonlinear encoder function f and
nonlinear decoder function g can thus learn a more
powerful nonlinear generalization of PCA.
But, …
12
Undercomplete Autoencoders
Unfortunately, if the encoder and decoder are
allowed too much capacity, the autoencoder can
learn to perform the copying task without
extracting useful information about the
distribution of the data.
E.g. An autoencoder with a one-dimensional code
and a very powerful nonlinear encoder can learn
to map x(i) to code.
The decoder can learn to map these integer indices
back to the values of specific training examples
13
Regularized Autoencoders
Ideally, choose code size (dimension of h) small
and capacity of encoder f and decoder g based
on complexity of distribution modeled.
Regularized autoencoders: Rather than limiting
model capacity by keeping encoder/decoder
shallow and code size small, we can use a loss
function that encourages the model to have
properties other than copy its input to output.
Sparsity of the representation
Smoothness of the derivatives
Robustness to noise and errors in the data
14
Sparse Autoencoders
Sparse autoencoders is a training criterion that
add a sparsity penalty to the loss function:
15
Denoising Autoencoders (DAEs)
In addition to add penalty terms, there are other
tricks for autoencoders to avoid copying data.
One trick is to add some noise to the input data,
is used to denote a noisy version of the input .
The denoising autoencoder (DAE) seeks to
minimize
18
Stochastic Encoders and Decoders
General strategy for designing the output units
and loss function of a feedforward network is to
Define the output distribution p(y|x)
class labels.
In an autoencoder x is the target as well as the
input.
Yet we can apply the same machinery as
before.
19
Stochastic Encoders and Decoders
P encoder ( h | x ) P decoder ( x | h )
Input Reconstruction
20
Denoising Autoencoders (DAEs)
Defined as an autoencoder that receives a corrupted
data point as input and is trained to predict the original,
uncorrupted data point as its output.
Traditional autoencoders minimize L(x, g ( f (x)))
DAE seeks to minimize .
• The autoencoder must undo this corruption rather
than simply copying their input.
Encoder Decoder
Noisy Denoised
Latent space
Input Input
representation
21
Denoising Autoencoders (DAEs)
DAE trained to reconstruct clean data point x from the
corrupted by minimizing loss
L=-log pencoder(x|h=f(x))
The autoencoder learns a reconstruction distribution
preconstruct(x| )) ) estimated from training pairs (x, )) as
follows:
1. Sample a training sample x from the training data
2. Sample a corrupted version from C( ~| =x)
3. Use (x, )) as a training example for estimating the
autoencoder distribution precoconstruct(x| ) =pdecoder(x|h)
with h the output of encoder f( ) and pdecoder typically
defined by a decoder g(h).
22
Denoising Autoencoders (DAEs)
Score matching is often employed to train
DAEs.
Score Matching encourages the model to
have the same score as the data
distribution at every training point x.
The score is a particular gradient field: x log p(x)
DAE estimates this score as (g(f(x)-x).
See picture on the next slide.
23
Denoising Autoencoders (DAEs)
Training examples x are red crosses.
Gray circle is equiprobable corruptions.
The vector field (g(f(x)-x), indicated by green
arrows, estimates the score x log p( x) which is the
slope of the density of data.
24
Contractive Autoencoder (CAE)
Contractive autoencoder has an explicit
regularizer on h=f(x), encouraging the derivatives
of f to be as small as possible:
Penalty Ω(h) is the squared Frobenius norm (sum
of squared elements) of the Jacobian matrix of
partial derivatives associated with encoder
function.
25
DAEs vs. CAEs
DAE make the reconstruction function
resist small, finite sized perturbations in
input.
CAE make the feature encoding function
resist small, infinitesimal perturbations in
input.
Both denoising AE and contractive AE
perform well!
Both are over overcomplete.
26
DAEs vs. CAEs
Advantage of DAE: simpler to implement
Requires adding one or two lines of code to
regular AE.
No need to compute Jacobian of hidden layer.
Advantage of CAE: gradient is
deterministic.
Might be more stable than DAE, which uses a
sampled gradient.
One less hyper-parameter to tune (noise-
factor).
27
Recurrent Autoencoders
In a recurrent autoencoder, the encoder is
typically a sequence-to-vector RNN which
compresses the input sequence down to a
single vector.
The decoder is a vector-to-sequence RNN
that does the reverse.
28
Convolutional autoencoders
Convolutional neural networks are far better
suited than dense networks to work with images.
Convolutional autoencoder: The encoder is a
regular CNN composed of convolutional layers
and pooling layers.
It typically reduces the spatial dimensionality of the
inputs (i.e., height and width) while increasing the
depth (i.e., the number of feature maps).
The decoder does the reverse using transpose
convolutional layers.
29
Applications of Autoencoders
Data compression
Dimensionality reduction
Information retrieval
Image denoising
Feature extraction
Removing watermarks from Images
30
Applications of Autoencoders
Autoencoders have been successfully applied to
dimensionality reduction and information retrieval
tasks.
Dimensionality reduction is one of the early
motivations for studying autoencoders.
yielded less reconstruction error than PCA.
31
Chapter Summary
Autoencoders motivated.
Sparse autoencoders
Denoising autoencoders
Contractive autoencoder
Recurrent/Convolutional autoencoders
Applications of Autoencoders
32