Generative Modeling
Generative adversarial networks
Denis Derkach, Artem Ryzhikov, Sergei Popov
Laboratory for methods of big data analysis
Spring 2023
In this Lecture
▶ Generative Adversarial Networks
– algorithm statement;
– ideal case;
– shortcomings of vanilla algorithm;
– proposed shortcuts.
D.. Derkach Generative Modeling Spring 2023 2
Idea
›
Reminder: 𝑓-divergence
▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:
"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.
D.. Derkach Generative Modeling Spring 2023 4
Reminder: 𝑓-divergence
▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:
"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.
D.. Derkach Generative Modeling Spring 2023 5
Reminder: 𝑓-divergence Convergence
▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:
"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.
D.. Derkach Generative Modeling To optimize 𝑟𝐾𝐿 properly we need access to true PDF, p(x).
Spring 2023 6
Reminder: Variational Lower Bound
▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:
"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.
▶ Lower bound can be written:
𝐷! (𝑃| 𝑄 ≥ max 𝔼#~' 𝑇 𝑥 − 𝔼#~( 𝑓 ∗ 𝑇 𝑥 ,
% #
𝑇 𝑥 is some random function.
▶ The tight boundary can be estimated for each 𝑓 -divergence (𝑇 ∗ (𝑥)).
*" #
▶ For JS-divergence 𝑇∗ 𝑥 = log " # +$ # , 𝑓 ∗ 𝑥 = − log 2 − exp 𝑡 .
D.. Derkach Generative Modeling Spring 2023 7
Lower Bound for JS
P(x) Q(x)
x x
2𝑝 𝑥 2𝑞 𝑥
JS(𝑃| 𝑄 ≥ 𝔼#~' log + 𝔼#~( log
𝑝 𝑥 +𝑞 𝑥 𝑝 𝑥 +𝑞 𝑥
It would be interesting to construct something close.
D. Derkach Spring 2023 8
Adversarial optimization
›
Rationale
▶ Need to optimize the model 𝑞, without the direct access to the 𝑝 𝑥 .
▶ Instead of minimizing over some analytically defined divergence with
parameter 𝜙, one could minimize over ”learned divergence”.
D.. Derkach Generative Modeling Spring 2023 10
Generator
▶ 𝐺, is a generator. It should sample from a random noise:
𝑧- ∼ 𝑁 0; 1 ;
𝑥- = G, 𝑧- .
▶ Our aim is 𝐺, as a neural network.
▶ We thus have a sample:
𝑥- ~𝑞, 𝑥
▶ 𝐺, can be defined in many ways. For example, physics generator.
Borisyak M et al. PeerJ Computer Science 6:e274 (2020)
D.. Derkach Generative Modeling Spring 2023 11
Discriminator
▶ Add a classifying neural network, discriminator 𝑫𝝓, to distinguish
between the real and generated samples.
▶ Optimize:
max(𝔼#~"(#) log (𝐷/ 𝑥 ) + 𝔼#~$! (#) log(1 − 𝐷/ 𝑥 ))
/
Real samples Generated samples
D.. Derkach Generative Modeling Spring 2023 12
G+D recap
max(𝔼#~"(#) log (𝐷/ 𝑥 ) + 𝔼#~$! (#) log(1 − 𝐷/ 𝑥 ))
/
min 𝔼2~3(4;6) log(1 − 𝐷/ 𝐺, (𝑧) )
/
D.. Derkach Generative Modeling Spring 2023 13
Training at a Glance
For D and G defined as neural networks, we can use backpropagation.
D.. Derkach Generative Modeling Spring 2023 14
Optimal Solution
𝐶 𝐺 = max 𝑉 𝐺, 𝐷 =
7
= 𝔼#~"(#) log(𝐷/∗ 𝑥 ) + 𝔼#~$! (#) log(1 − 𝐷/∗ 𝑥 ) =
𝑝 𝑥 𝑞 𝑥
= 𝔼#~' log + 𝔼#~$! (#) log
𝑝 𝑥 +𝑞 𝑥 𝑝 𝑥 +𝑞 𝑥
D.. Derkach Generative Modeling Spring 2023 15
Lower Bound Reminder
▶ In case of ideal discriminator:
𝑝 𝑥 𝑞 𝑥
𝐶 𝐺 = 𝔼#~' log + 𝔼#~( log
𝑝 𝑥 +𝑞 𝑥 𝑝 𝑥 +𝑞 𝑥
▶ This can be compared to variational bound:
*" # *$ #
JS(𝑃| 𝑄 ≥ 𝔼#~"(#) log + 𝔼#~$(#) log
" # +$ # " # +$ #
▶ Difference is only in log 4
D.. Derkach Generative Modeling Spring 2023 16
Optimal Solution
D.. Derkach Generative Modeling Spring 2023 17
GAN algorithm
D.. Derkach Generative Modeling Spring 2023 18
GAN results
I. Goodfellow,et al. Generative Adversarial Networks, NIPS 2014
D.. Derkach Generative Modeling Spring 2023 19
GAN First Paper Disclaimer
While we make no claim that these samples are better than
samples generated by existing methods, we believe that these
samples are at least competitive with the better generative
models in the literature and highlight the potential of the
adversarial framework.
I. Goodfellow, Generative Adversarial Networks, NIPS 2014
D.. Derkach Generative Modeling Spring 2023 20
Enhancing GAN
›
Unsupervised Feature Learning
E Denton et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
D.. Derkach Generative Modeling Spring 2023 22
LAPGAN: results
D.. Derkach Generative Modeling Spring 2023 23
Convolutional Layers are Here to Help
▶ pooling layers à convolution layers.
▶ Use batchnorm.
▶ No fully connected hidden layers.
▶ ReLU activation in generator.
▶ LeakyReLU activation in the discriminator for all layers.
A. Radford et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, ICLR 2016
D.. Derkach Generative Modeling Spring 2023 24
DCGAN: results
D.. Derkach Generative Modeling Spring 2023 25
Walking in the Latent Space
TV Window
D.. Derkach Generative Modeling Spring 2023 26
Arithmetic in the Latent Space
D.. Derkach Generative Modeling Spring 2023 27
GAN problems
›
Game Approach Problems
Martin Arjovsky, Towards Principled Methods for Training Generative Adversarial Networks , ICLR17
D.. Derkach Generative Modeling Spring 2023 29
Mode Collapse
• GANs choose to generate a small
number of modes due to a defect in
the training procedure, rather than
due to the divergence they aim to
minimize.
I. Goodfellow NIPS 2016 Tutorial: Generative
Adversarial Network
Luke Metz et al Unrolled Generative Adversarial Networks ICLR 2017
D.. Derkach Generative Modeling Spring 2023 30
Mode Collapse
▶ For fixed D:
– 𝐺 tends to converge to a point
𝑥 ∗ that fools 𝐷 the most.
– In extreme cases, 𝐺 becomes
independent on 𝑧.
– Gradient on 𝑧 diminishes.
▶ When D restarts:
– Easily finds this x ∗ .
– Pushes G to the next point 𝑥 ∗∗ .
T. Che Mode Regularized Generative Adversarial Networks ICLR 2017
D.. Derkach Generative Modeling Spring 2023 31
Vanishing Gradients
▶ For disjoint support of real and
generated data
▶ An ideal discriminator can
perfectly tell the real and generated
data apart:
𝐷 𝐺(𝑧) ≈ 0
D.. Derkach Generative Modeling Spring 2023 32
Vanishing Gradients
▶ 𝐿! = − log 𝐷(𝐺 𝑧 )
"#(%)
▶ ,"% ≈ 0 for generated 𝑥
"'! (%)
▶ ,"% ≈ 0 for generated 𝑥
▶ Generator can’t train!
▶ Need to start closer (how?)
▶ Problem is further enhanced
due to noisy estimate from
data.
D.. Derkach Generative Modeling Spring 2023 33
Summary so Far
▶ Pros:
– Can utilize power of back-prop.
– No explicit intractable integral.
– No MCMC needed.
▶ Cons:
– Unclear stopping criteria.
– No explicit representation of 𝑔! (𝑥) .
– Hard to train.
– No evaluation metric so hard to compare with other models.
– Easy to get trapped in local optima that memorize training data.
– Hard to invert generative model to get back latent 𝑧 from generated 𝑥.
D.. Derkach Generative Modeling Spring 2023 34
Fixing GANs
›
Diminishing Gradients
▶ We have seen already that signal data is located on manifold.
▶ GAN case is in fact more complicated, as we need a discriminator that
distinguishes two supports.
▶ This is way too easy, if supports are disjoint.
D.. Derkach Generative Modeling Spring 2023 36
Diminishing Gradients: Noisy Supports
▶ Let’s make the problem harder: introduce random noise 𝜀 ∼ 𝑁(0; 𝜎 *𝐼):
ℙ#+8(#) = 𝔼9∼'(#) ℙ8 (𝑥 − 𝑦).
▶ This will make noisy supports, that makes it difficult for discriminator.
Martin Arjovsky, Towards Principled Methods for Training Generative Adversarial Networks , ICLR17
D.. Derkach Generative Modeling Spring 2023 37
Feature Matching
Danger of overtrain to match known tests!
D.. Derkach Generative Modeling Spring 2023 38
Historical averaging
Salimans et al. Improved Techniques for Training GANs, NIPS16
D.. Derkach Generative Modeling Spring 2023 39
Look into the Future: unrolled GANs
D.. Derkach Generative Modeling Spring 2023 40
Unrolled GAN: results
Luke Metz et al Unrolled Generative Adversarial Networks ICLR 2017
D.. Derkach Generative Modeling Spring 2023 41
𝑓-GANs
›
Reminder: Variational Lower Bound
▶ For convex 𝑓(. ), 𝑃 and 𝑄 some distributions, we define 𝑓 -divergence:
"#
𝐷! (𝑃| 𝑄 = ∫ 𝑞 𝑥 𝑓 $ #
𝑑𝑥.
▶ This is bounded:
𝐷! (𝑃| 𝑄 ≥ max 𝔼#~' 𝑇 𝑥 − 𝔼#~( 𝑓 ∗ 𝑇 𝑥 ,
% #
𝑇 𝑥 is some random function.
▶ The tight boundary can be estimated for each 𝑓 -divergence (𝑇 ∗ (𝑥)).
*" #
▶ For JS-divergence 𝑇∗ 𝑥 = log " # +$ # , 𝑓 ∗ 𝑥 = − log 2 − exp 𝑡 .
D.. Derkach Generative Modeling Spring 2022 43
Variational Divergence Minimization
𝐷! (𝑃| 𝑄 ≥ max 𝔼#~' 𝑇 𝑥 − 𝔼#~( 𝑓 ∗ 𝑇 𝑥 ,
% #
▶ Work in GAN paradigm:
– generator 𝑥~𝑄: 𝑥 = 𝐺! 𝑧 ;
– test function T 𝑥 .
min max 𝐹 𝜔, 𝜃 = 𝔼#~' 𝑇; 𝑥 − 𝔼#~<! (2) 𝑓 ∗ 𝑇; 𝑥
, ;
▶ To have wider range of functions:
𝑇; 𝑥 = 𝑔! 𝑉; 𝑥 ,
here 𝑔! : ℝ → 𝑑𝑜𝑚!∗ is output activation function for 𝑓 -divergence used
S. Nowozin et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
D.. Derkach Generative Modeling Spring 2022 44
Output activation function
𝐹 𝜔, 𝜃 = 𝔼#~' 𝑔! (𝑉; 𝑥 ) − 𝔼#~<! (2) 𝑓 ∗ 𝑔! (𝑉; 𝑥 )
choice of output activation function is somewhat arbitrary.
D.. Derkach Generative Modeling Spring 2022 45
Example: GAN objective
𝐹 𝜔, 𝜃 = 𝔼#~' 𝑔! (𝑉; 𝑥 ) − 𝔼#~<! (2) 𝑓 ∗ 𝑔! (𝑉; 𝑥 )
𝑔<?3 = − log(1 + exp −𝑣 )
𝑓 ∗ (𝑡) = − log(1 − exp(𝑡))
𝐹 𝜔, 𝜃 = 𝔼#~' log 𝐷; (𝑥) − 𝔼#~<! 2 log 1 − 𝐷; (𝑥) ,
for the last nonlinearity in the discriminator taken as the sigmoid
𝐷; (𝑥) = 1/(1 + 𝑒 =>" # )
D.. Derkach Generative Modeling Spring 2022 46
Variational Divergence Minimization
min max 𝐹 𝜔, 𝜃 = 𝔼#~' 𝑔! (𝑉; 𝑥 ) − 𝔼#~<! (2) 𝑓 ∗ 𝑔! (𝑉; 𝑥 )
, ;
D.. Derkach Generative Modeling Spring 2022 47
𝑓-GAN results
D.. Derkach Generative Modeling Spring 2022 48
𝑓-GAN Discussion
▶ Using 𝑓 -GAN approach, one can estimate any 𝑓 -divergence.
▶ Construction has some freedom in choice of function.
▶ Using different 𝑓 -divergence leads to very different learning dynamics.
▶ Does not solve mode collapse problem.
▶ We need a better way to train GANs.
D.. Derkach Generative Modeling Spring 2022 49
Conclusions: GANs
▶ use Generator-Discriminator game to estimate the distance from generated
distribution to the true one.
▶ produce sharp images.
▶ reconstruct implicit model of target PDF.
D.. Derkach Generative Modeling Spring 2023 50