0% found this document useful (0 votes)
53 views4 pages

AI42001 Practice 2

The document contains 11 questions related to machine learning algorithms and techniques including linear classification, random forests, Adaboost, SVMs, dimensionality reduction, neural networks, clustering, and Gaussian mixture models. The questions involve tasks like classifying test data using one-vs-one and one-vs-all approaches, building random forests of specified structure and classifying test data, running Adaboost for a specified number of iterations, classifying data using kernelized SVMs, finding dimensionality reduction coefficients that achieve perfect accuracy on a dataset, analyzing and computing outputs of a specified neural network architecture, performing agglomerative clustering with different linkage methods and thresholds, running K-means and K-means++ clustering algorithms for a specified number of iterations

Uploaded by

rohan kalidindi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views4 pages

AI42001 Practice 2

The document contains 11 questions related to machine learning algorithms and techniques including linear classification, random forests, Adaboost, SVMs, dimensionality reduction, neural networks, clustering, and Gaussian mixture models. The questions involve tasks like classifying test data using one-vs-one and one-vs-all approaches, building random forests of specified structure and classifying test data, running Adaboost for a specified number of iterations, classifying data using kernelized SVMs, finding dimensionality reduction coefficients that achieve perfect accuracy on a dataset, analyzing and computing outputs of a specified neural network architecture, performing agglomerative clustering with different linkage methods and thresholds, running K-means and K-means++ clustering algorithms for a specified number of iterations

Uploaded by

rohan kalidindi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Q1.

Given a set of training datapoints with 4 different labels, and a test


datapoint, attempt to classify it using linear classifiers in both one-vs-one and
one-vs-all approaches.

Choose the linear classifiers in any way you like.

Training set: (x1=1, x2=2, y=1); (x1=2, x2=1, y=1); (x1=3, x2=2, y=1); (x1=1, x2=3,
y=2); (x1=2, x2=3, y=2); (x1=2, x2=4, y=2); (x1=4, x2=1, y=3); (x1=4, x2=2, y=3); (x1=5,
x2=2, y=3); (x1=5, x2=3, y=4); (x1=5, x2=4, y=4); (x1=6, x2=4, y=4);

Test set: (x1=3, x2=3,y=?)

Q2. Given a set of training datapoints (dimension: 4), create a random forest
with 3 trees, where every tree has depth 1 only. For each tree, choose any 2 of
the dimensions. Given the test datapoint, what will be output of the random
forest?

Training set: (x1=1,x2=2,x3=1,x4=4,y=1); (x1=5,x2=2,x3=1,x4=4,y=2);


(x1=1,x2=2,x3=1,x4=2,y=3);(x1=1,x2=2,x3=8,x4=4,y=1);
(x1=1,x2=5,x3=3,x4=2,y=2); (x1=5,x2=2,x3=3,x4=4,y=3);
(x1=9,x2=2,x3=8,x4=4,y=1); (x1=8,x2=2,x3=5,x4=4,y=2);
(x1=3,x2=2,x3=2,x4=3,y=3); (x1=6,x2=9,x3=9,x4=8,y=1);
(x1=9,x2=8,x3=7,x4=6,y=2); (x1=1,x2=9,x3=5,x4=5,y=3);

Test set: (x1=7,x2=1,x3=9,x4=3,y=?); (x1=-4,x2=9,x3=1,x4=3,y=?); (x1=12,x2=-3,x3=-


9,x4=18,y=?)

Q3. You are given 6 2D data points with binary class labels. Start with a linear
classifier y = sign(x1 – x2) where x1, x2 are the two dimensions. Run Adaboost
algorithm for 2 iterations, and thus obtain 2 more linear classifiers along with
their weights. Carry out ensemble classification of these points using the 3 linear
classifiers (including initial one).

IDNo 1 2 3 4 5 6

X1 2.3 4.8 2.1 4.5 7.2 1.7


X2 3.8 3.4 5.8 7.3 11.6 6.2

Y 1 1 -1 1 -1 -1

Q4. You are given 4 3-dimensional labelled points. Find a 3D linear classifier
that separates the two classes, and calculate the margins of that classifier with
respect to each of the classes. Note: the perpendicular distance of point (l,m,n)
from the line ax+by+cz+d=0 is given by |al+bm+cn+d|/√(a2+b2+c2)
[denominator: rootover a square plus b square plus c square]
ID 1 2 3 4
X1 2 -1 0 1
X2 1 0 1 -1
X3 -1 2 2 1
Y +1 +1 -1 -1

Q5. Given a set of training datapoints and their corresponding ‘alpha’s, a set of
test datapoints and a kernel function, show the classification results using
Kernelized SVM. Assume b=0.

Training: (x1=1, x2=3, y=-1); (x1=2, x2=1, y=-1); (x1=4, x2=0, y=-1); (x1=0,
x2=4, y=-1); (x1=1, x2=5, y=1); (x1=2, x2=5, y=1); (x1=8, x2=2, y=1); (x1=3, x2=6,
y=1);

Alpha: [0 0 1 1 0 1 0 0]; Kernel: K(a,b) = (a.b)2

Test: (x1=-1, x2=2, y=1), (x1=1, x2=0, y=-1)

Q6. You are given 4 3D data-points. You want to reduce them into 1-dimension,
as a*X1+b*X2+c*X3 where a, b, c coefficients, at least 1 of whom should be 0.
For what values of a, b, c can you get 100% accuracy on this dataset?

ID 1 2 3 4

X1 1 2 -1 0

X2 2 3 -2 1
X3 1 -3 1 0

Y A A B B

Q7. Given a set of 4 3D points, you are given two sets of components to project
them onto: A) W1=[3/5 4/5 0]; W2=[4/5 -3/5 0] B) V1=[5/13 0 12/13];
V2=[12/13 0 -5/13]. Which will you prefer to project them onto?

ID 1 2 3 4

X1 3.1 5.8 -1.6 7.3

X2 3.9 8.2 -2.2 9.8

X3 0.1 -1.1 0.5 -2.1

Q8. A Neural Network is designed to classify 16-dimensional vectors into one of


3 classes. It has three layers – input, max-pooling (hidden) and fully-connected
(output). The input layer is 16-dimensional. The hidden layer has 4 nodes, each
of which connects to 4 input dimensions (non-overlapping) [example: first node
connects to input dimensions 1-4, second node connects to input dimensions 5-
8 etc]. The output layer has 3 nodes: each representing the probability of a class.
Let the weights of the fully-connected layer be W1, W2,….W12 etc.

i) Draw the network

ii) Given an input vector x = [1 2 3 4 5 6 7 8 8 7 8 5 4 3 2 1], show the computations


in the layers

iii) Using softmax function on the output layer, write the probability distribution
of its class label according to our NN

Q9. Given a set of datapoints, carry out agglomerative clustering using (i) single-
linkage using threshold 5, (ii) complete linkage using threshold 10. Use Euclidean
distance.
(x1=1, x2=1); (x1=3, x2=3); (x1=9, x2=6); (x1=4, x2=7); (x1=9, x2=4); (x1=1, x2=9);
(x1=13, x2=9);

Q10. You are given a set of 10 data-points (2D) below. Attempt to partition them
into 3 clusters using a K-means b) K-means++. Use Manhattan distance to
identify nearest cluster centres from each point. Show 3 complete iterations in
each case.

[(-1,1), (5,-6), (0,-1), (-4,4), (1,0), (0,-2), (6,4), (-6,5), (-2,-1), (-5,-7)]

Q11. You have the following 1D observations, and you wish to fit a Gaussian
Mixture Model on it. Looking at the data, decide how many Gaussian
components you want to use. Then estimate the Gaussian parameters using E-
M algorithm.

[2.3, 4.7, -5.5, -4.8, 9.1, 3.5, 10.4, -4.3, 11.2, 1.9, 10.8, 3.4]

You might also like