Assignment Week 12-Deep-Learning PDF
Assignment Week 12-Deep-Learning PDF
Deep Learning
Assignment- Week 12
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
We are given two distributions. The first distribution, P is a uniform distribution between [-3, 3].
Another distribution, Q is a Normal distribution with zero mean and standard deviation of 1.
What will be the KL(Q||P)?
a. 0.5
b. 0.0
c. 1.0
d. ∞
Correct Answer: d
Detailed Solution:
QUESTION 2:
Which of the following is True regarding the reconstruction loss (realized as mean squared error
between input and predicted signal) of standard auto-encoder?
a. Such loss is not differentiable and cannot be used for back propagation
b. Such loss tends to form distinct clusters in latent space
c. Such loss cannot be optimized with gradient descent
d. None of the above
Correct Answer: b
Detailed Solution:
MSE based losses tend to form clusters in latent space of auto-encoders for similar category
signals, but in an unsupervised way. The remaining options are False.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 3:
What can be the maximum value of KL divergence metric?
a. 1
b. 0
c. ∞
d. 0.5
Correct Answer: c
Detailed Solution:
KL divergence is finite only if the support of p (range of the values of x for which p(x) is
defined) is contained within the support of q. However, note that KL divergence can be
infinite even if p(x) and q(x) are nonzero for all x. For example, we can show that for two
distributions, namely, Cauchy and Normal. The KL divergence is infinity even though both
of the distributions are defined for all real values of x.
Proof:
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
For an auto-encoder, suppose we give an input signal, x and reconstruct a signal y. Which one
of the following objective functions can we MINIMIZE to train the parameters of the auto-
encoder with gradient descent optimizer?
Detailed Solution:
Except for option (c) all other objective functions will INCREASE if reconstructed signal
starts matching the input signal, x, and thus we will deviate away from our objective of
training the auto-encoder to mimic the input signal.
______________________________________________________________________________
QUESTION 5:
Suppose we have a 2N dimensional Normal distribution in which we assume all components are
independent of each other. What will be the size (number of elements) of the vector to fully
represent the covariance matrix of this distribution?
a. N
b. 2N
c. N/2
d. N/4
Correct Answer: b
Detailed Solution:
Since each component is independent of each other, there will be only 2N diagonal entries
in the covariance matrix. The remaining elements will be all equal to 0. So, a 2N
dimensional vector is sufficient to represent the covariance matrix. This principal is used in
VAE architecture.
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
What will happen if we do not enforce KL divergence loss in VAE latent code space?
a. The latent code distribution will be mimic zero mean and unit variance Normal
distribution
b. Network will learn to form distinctive clusters with high standard deviation for
each cluster
c. Network will learn to form distinctive clusters with low standard deviation for
each cluster
d. None of the above
Correct Answer: c
Detailed Solution:
With zero KL loss, the encoder part of VAE will try to form separated clusters (by
increasing the distance of the mean vectors) and simultaneously reduce the standard
deviation of each cluster to reduce confusion for the decoder part of the network. This will
allow to efficiently reduce the reconstruction loss only present as the loss component in the
network. So, without the KL loss, the network will reduce to a simple autoencoder network.
____________________________________________________________________________
QUESTION 7:
KL divergence between two discrete distributions, P and Q is given as 𝐾𝐿 (𝑄 || 𝑃):
𝑄(𝑥)
a. 𝐾𝐿(𝑄||𝑃) = ∑𝑥 𝑄(𝑥)𝑙𝑜𝑔
𝑃(𝑥)
𝑃(𝑥)
b. 𝐾𝐿(𝑄||𝑃) = ∑𝑥 𝑃(𝑥)𝑙𝑜𝑔
𝑄(𝑥)
𝑄(𝑥)
c. 𝐾𝐿(𝑄||𝑃) = − ∑𝑥 𝑄(𝑥)𝑙𝑜𝑔
𝑃(𝑥)
𝑃(𝑥)
d. 𝐾𝐿(𝑄||𝑃) = ∑𝑥 𝑄(𝑥)𝑙𝑜𝑔
𝑄(𝑥)
Correct Answer: a
Detailed Solution:
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 8:
Figure shows latent vector addition of two concepts of “man without a hat” and “hat”. What is
expected from the resultant vector?
Detailed Solution:
It is expected that VAE latent space follows semantic vector arithmetic. Thus the resultant
vector is a vector addition of the two semantic concepts which will result in the final vector
to represent a MAN WITH HAT.
______________________________________________________________________________
QUESTION 9:
Which one of the following statements is True in the original GAN training?
Correct Answer: d
Detailed Solution:
Since GAN game is played under a zero-sum non-cooperative game; if one of the wins, the
opponent losses and the Nash equilibrium (when the Discriminator fails to distinguish real
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
samples from fake samples) is not reached. Thus, it is NOT desired that loss function of
any of the players decrease monotonically.
____________________________________________________________________________
QUESTION 10:
When the GAN game has converged to its Nash equilibrium (when the Discriminator randomly
makes an error in distinguishing fake samples from real samples), what is the probability (of
belongingness to real class) given by the Discriminator to a fake generated sample?
a. 1
b. 0.5
c. 0
d. 0.25
Correct Answer: b
Detailed Solution:
Nash equilibrium is reached when the generated distribution, pg(x) equals the original data
distribution, pdata(x), which leads to D(x) = 0.5 for all x.
______________________________________________________________________
______________________________________________________________________________
************END*******