2024f-java-uml-programming
2024f-java-uml-programming
- mnist_test.csv
- mnist_train.csv
This java practical work relies on no other prerequisites than having taken the java lectures. Open
book, chatgpt allowed, individual work only.
Modify your program to take in account this format and get an array of String from this
line. You can take advantage of the split function.
c. Convert this array of String into an array of double (beware, you have one line to define
the headers)
- An ImageCsvDAO that will contain a function getAllImages(), and will contain all the code
you have written to read data from a csv file. Be careful, the file name can vary, so make it
variable in this service class.
As the goal here is not to train you on ML, but to find an interesting way to practice java,
mathematics necessary to achieve that will be really limited to simple formula (like knowing how to
calculate a distance between 2 points or how to calculate an average from a list of values.
As you have noticed, the first column contains what we call a “label”, which indicates what is
supposed to represent the associated image (the 28x28 matrix).
To introduce briefly what is (statistical) machine learning: it is a process where a program will
determine statistical characteristics of a dataset containing well known labels to take decisions, like
classification decisions.
- For each digit, we’ll compute the “average representant” (also called centroid) of each digit
image.
- Providing a new digit image, we’ll compare the distances from this image to each digit
centroid, and we’ll decide that this image will be classified as the digit from which the
average representant is the closest in terms of distance.
Hint : the distance can be defined as the square root of the sum of each module (absolute
difference) of index-to-index values of the 2 considered matrix. Once you have that value, you
have to take the minimum. You can hava a hint here:
https://hlab.stanford.edu/brian/euclidean_distance_in.html
https://stackoverflow.com/questions/54357325/calculating-largest-euclidean-distance-between-
two-values-in-a-2d-array
- True positives, example: “the prediction is 5 and the label is actually 5.”
- True negatives, example: “the prediction is ‘not 5’ and the label is 4”.
- False positives, example: “the prediction is 5 and the label was actually 4”
- False negatives, example : “the prediction is ‘not 5’ but the label was actually 5”.