0% found this document useful (0 votes)
6 views3 pages

Tutorial 6 Machine Learning

Uploaded by

mokkagameplayyt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
6 views3 pages

Tutorial 6 Machine Learning

Uploaded by

mokkagameplayyt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 3

Tutorial 6: Machine Learning

Question 1: How old is the abalone?


It is possible to get a precise reading on the age of an abalone by slicing the shell and counting
growth rings, much like gauging the age of a tree by counting rings. The problem for scientists
studying abalone populations is that it is expensive and time‐consuming to slice the shells and count
the rings under a microscope. It would be more convenient and economical to be able to make
simple physical measurements like length, width, weight, and so forth and then to use a predictive
model to process the measurements and make an accurate determination of the age of the abalone.

For the above purpose, scientists have collected a number of measurements on over 4000 abalones.
The dataset is available at https://archive.ics.uci.edu/ml/datasets/abalone.

The dataset comes in the form of (4177 rows):

M 0.455 0.365 0.095 0.514 0.2245 0.101 0.15 15


M 0.35 0.265 0.09 0.2255 0.0995 0.0485 0.07 7
F 0.53 0.42 0.135 0.677 0.2565 0.1415 0.21 9
M 0.44 0.365 0.125 0.516 0.2155 0.114 0.155 10
I 0.33 0.255 0.08 0.205 0.0895 0.0395 0.055 7
I 0.425 0.3 0.095 0.3515 0.141 0.0775 0.12 8
F 0.53 0.415 0.15 0.7775 0.237 0.1415 0.33 20

...

The columns are described as below:

Name Data Type Meas. Description


---- --------- ----- -----------
Gender nominal M, F, and I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years

With the above information and with the intention to solve the problem with machine learning
tools, i.e. to predict the age of an abalone from external measurements. Answer the following
questions:

a. Is the problem a supervised or unsupervised one?


b. If supervised (skip c),
a. Is it a regression or classification problem?
b. Propose a learning tool to solve the problem.

1
c. Define the hypothesis, i.e. the equation.
d. How will you learn the parameters for the hypothesis? What will your training set
comprise of (features, examples), and what will be your labels?
c. If unsupervised (skip b),
a. Propose a learning tool to solve the problem.
b. Define the hypothesis or model, i.e. the equation or network.
c. How will you learn the parameters for the hypothesis? What will your training set
comprise of (features, examples), and how many classes will you set?

Note that the attribute “gender” has categorical values. The error or distance calculation will not be
the same as, for example, using an Euclidean distance. In your learning problem, you will exclude
this feature.

Question 2: Gradient descent


Given the following data.

x1 x2 y

6 2 13
3 5 4
4 4 7
6 10 5
4 2 9
3 4 5

You want to learn a linear regression predictor for y. Answer the following questions.

a. Write the hypothesis function.


b. Write the gradient equations.
c. How many parameters are there?
d. Illustrate the steps to learn the parameters by performing the first 4 iterations.

Question 3: Iris flowers


The Iris flower data set or Fisher's Iris data set is a multivariate data set that collects the data to
quantify the morphologic variation of Iris flowers of three related species. The dataset is available at
https://archive.ics.uci.edu/ml/datasets/iris.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris
versicolor). Four features were measured from each sample: the length and the width of the sepals
and petals, in centimetres. The dataset contains:

sepal_length sepal_width petal_length petal_width label


5.1 3.5 1.4 0.2 Iris-setosa
4.9 3 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
...

2
7 3.2 4.7 1.4 Iris-versicolor
6.4 3.2 4.5 1.5 Iris-versicolor
6.9 3.1 4.9 1.5 Iris-versicolor

...

6.3 3.3 6 2.5 Iris-virginica


5.8 2.7 5.1 1.9 Iris-virginica
7.1 3 5.9 2.1 Iris-virginica

...

You would like to implement an ANN to classify the flowers among the three species. Answer the
following questions:

a. Let’s use an ANN with only one hidden layer. Suggest the size (number of nodes) of the
input, hidden and output layers. Keep to minimal sizes.
b. Draw the ANN.
c. Write the vectorized representation for the activation output function for the hidden layer
and output layer. Clearly write out the elements for the vector/matrix.
d. How many parameters are there to learn?

You might also like