SEMESTER 1 EXAMINATIONS 2021/2022
MODULE: EE514 - Data Analysis and Machine Learning
PROGRAMME(S):
MECE MEng Electronic & Computer Engineering
MCTY MSc Electronic and Computer Technology
MSAR MSc in Astrophysics and Relativity
GCIOT Grad Cert in the Internet of Things
CAPT PhD-track
GCECE Grad Cert. in Electronic & Computer Eng
EEPT PhD-track
MQTY Qualifier Prog MSc Electronic & Computer
MEQ Masters Engineering Qualifier Course
MEPT PhD-track
MEPD PhD
MECEW MEng In Electronic & Comp Eng. (Wuhan)
CSPT PhD-track
CAPD PhD
YEAR OF STUDY: 1,2,3,C
EXAMINERS:
Dr. Kevin McGuinness 087 6596732
Prof. Roberto Verdone (External) External
TIME ALLOWED: 3 hours
INSTRUCTIONS: Answer 4 questions. All questions carry equal marks.
PLEASE DO NOT TURN OVER THIS PAGE UNTIL INSTRUCTED TO DO SO
The use of programmable or text storing calculators is expressly forbidden.
Please note that where a candidate answers more than the required number of questions,
the examiner will mark all questions attempted and then select the highest scoring ones.
There are no additional requirements for this paper.
EE514–Data Analysis and Machine Learning Page 1 of 6
Semester 1 EXAMINATIONS 2021/2022
QUESTION 1 [TOTAL MARKS: 25]
Data summarization
Q 1(a) [5 Marks]
Explain, using the aid of a diagram, what is meant by a mode of a distribution. Explain
the distinction between modes and the mean and median. Give an example of a
distribution where the mean, median, and mode are coincident.
Q 1(b) [5 Marks]
How might you go about estimating the modes of the distribution of a quantitative
(continuous) random variable from a sample? What considerations are important
and what difficulties might be encountered? How could this approach be extended to
estimate the modes of the distribution a random variable in RD from a sample and what
difficulties might be encountered?
Q 1(c) [5 Marks]
Describe what is meant by the interquartile range (IQR) and explain how it could be
calculated from a sample. Explain a common method for detecting outliers based on the
interquartile range. Should outliers always be discarded before subsequent analysis?
Explain your reasoning.
Q 1(d) [6 Marks]
2 N
Suppose that you have a sample {(xi , yi ) ∈ R }i=1 of two random variables X and Y
and you suspect that there is an exponential relationship between X and Y of the form
Y = exp(aX + b) + ϵ,
with ϵ ∼ N (0, σ 2 ). Explain how you might fit the parameters a and b and explain how
you could formally measure the strength of the relationship between X and Y .
Q 1(e) [4 Marks]
Describe a statistic that can be used to measure how heavy the tails of a distribution
are and explain how you could estimate this statistic from a sample.
[End Question 1]
EE514–Data Analysis and Machine Learning Page 2 of 6
Semester 1 EXAMINATIONS 2021/2022
QUESTION 2 [TOTAL MARKS: 25]
Supervised Learning
Q 2(a) [6 Marks]
What is the purpose of validation data in supervised machine learning and how does
it differ from test data? Outline the benefits and drawbacks of using k-fold cross
validation as opposed to a dedicated validation set. Describe a situation where a
dedicated validation set may be preferable to k-fold cross validation.
Q 2(b) [8 Marks]
Explain the training and prediction procedure for a k-nearest neighbour classifier. In
the case of k = 1, what is the training error of such a model? In the case of k = N ,
where N is the number of training examples, what will the prediction of the model be?
In general, what happens to classifier bias and variance as k is increased?
Q 2(c) [8 Marks]
How many free parameters are there in a 10-class quadratic discriminant analysis
model if the dimension of the input is 100? Describe TWO assumptions you could
make to reduce the number of free parameters and calculate the number of parameters
in the simplified models.
Q 2(d) [3 Marks]
Explain what is meant by overfitting. How could you tell if a predictive model has
overfit?
[End Question 2]
EE514–Data Analysis and Machine Learning Page 3 of 6
Semester 1 EXAMINATIONS 2021/2022
QUESTION 3 [TOTAL MARKS: 25]
Linear models
Q 3(a) [6 Marks]
Show that putting a zero mean isotropic Gaussian prior w ∼ N (0, σI) on the weights of
a linear model results in a regularization term proportional to the squared 2-norm of the
weights ∥w∥22 in the loss function.
Q 3(b) [5 Marks]
Linear regression models can be fit in closed form by solving the normal equations:
X T Xw = X T y,
where w ∈ RD are the unknown weights, y ∈ RN contains the targets yi , and X ∈ RN ×D
is a data matrix with the training example inputs xi on the rows. Describe a situation
when you might prefer to fit a linear regression model using an iterative approach like
stochastic gradient descent, rather than fitting it by solving the normal equations and
explain your reasoning.
Q 3(c) [14 Marks]
Given a training set T = {(xi , yi )}N
i=1 with x i ∈ R2000
and yi ∈ {0, 1}, the loss function
for the L1 regularized logistic regression model can be written as:
N
1 X
L(w) = l(yi , σ(wT xi + b))2 + λ∥w∥1 ,
2N i=1
where λ is the regularization hyperparameter, l(y, ŷ) is a per-sample loss function, and
σ(x) is the sigmoid function.
i. Which symbol(s) in the formula above represents the parameters of the model and
how many parameters does the model have?
ii. Explain in your own words why the sigmoid function is used here.
iii. Describe what you expect to happen to the model parameters w as the value of λ
increases.
iv. Describe how you could go about selecting a good value for the λ hyperparameter
in practice.
v. Name and write down the formula for an appropriate per-sample loss function l(y, ŷ)
to use here.
[End Question 3]
EE514–Data Analysis and Machine Learning Page 4 of 6
Semester 1 EXAMINATIONS 2021/2022
QUESTION 4 [TOTAL MARKS: 25]
Unsupervised learning
Q 4(a) [5 Marks]
Write down and explain the k-means objective function (assuming fixed k) and describe
an algorithm for approximate minimization. Is it possible, in general, to exactly minimize
the k-means objective in polynomial time? Explain why or why not.
Q 4(b) [5 Marks]
How many parameters are there in a 6-component Gaussian mixture model in 5
dimensions? Describe an assumption that could be made to reduce the number of
parameters.
Q 4(c) [15 Marks]
N ×D
Given a sample matrix X ∈ R , the principal component analysis algorithm seeks
a transform Q such that the empirical covariance matrix of the transformed variables
Z = XQ is diagonal.
1. What is meant by the empirical covariance matrix here? How would you compute
the empirical covariance matrix for X?
2. Describe how to compute the transformation Q from the empirical covariance
matrix X.
3. Show that the covariance of the transformed variables is Z = XQ is a diagonal
matrix Λ.
4. Explain the meaning of the values found along the main diagonal of Λ.
[End Question 4]
EE514–Data Analysis and Machine Learning Page 5 of 6
Semester 1 EXAMINATIONS 2021/2022
QUESTION 5 [TOTAL MARKS: 25]
Neural networks and deep learning
Q 5(a) [5 Marks]
What is a convolution layer? Name and describe the hyperparameters that need to be
chosen when using a convolution layer?
Q 5(b) [7 Marks]
Explain with the aid of a diagram how you could recognize that a neural network is
overfitting the training data while training the network. Describe three ways you could
mitigate overfitting.
Q 5(c) [10 Marks]
Consider the neural network depicted in the figure below:
ŷ
(2) (2) (2) (2)
h1 h2 h3 h4
(1) (1) (1) (1)
h1 h2 h3 h4
x1 x2 x3 x4 x5
1. What is the dimension of the input of the model?
(1)
2. Give a formula for computing hi from the inputs, assuming the activation function
is a rectified linear unit. Introduce notation as needed for weights and biases.
3. Assuming each non-input node has an associated weights and biases (not shown),
calculate the number of parameters of the model.
4. Assuming this was a binary classification model, what would be a suitable activa-
tion function for ŷ?
5. Assuming this was a binary classification model, what would be an appropriate
loss function to use to fit this model?
Q 5(d) [3 Marks]
Describe how you could choose the initial starting weights when optimizing a multilayer
perceptron. Explain why is it important not to initialize all the weights to zero.
[End Question 5]
[END OF EXAM]
EE514–Data Analysis and Machine Learning Page 6 of 6
Semester 1 EXAMINATIONS 2021/2022