Project Assignment.2024
Project Assignment.2024
Spring 2024
Objectives
This assessment task addresses the following objectives from the subject outline:
Overview
In this assignment, you will build a data mining framework from scratch. This framework
contains one clustering/classification method and one sequence mining method. The
suggested dataset is one of the following resources:
- https://www.kaggle.com/datasets/podsyp/production-quality
- https://www.kaggle.com/datasets/fedesoriano/wind-speed-prediction-dataset
- https://www.kaggle.com/datasets/jmmvutu/summer-products-and-sales-in-
ecommerce-wish
Given a certain training dataset following a specific data format, your framework should be
built for a classification/prediction process. You need to identify the target label for
classification/prediction. You will test your data mining framework with the real-world
dataset to evaluate the quality of the processes.
In order to build this data mining framework, you need to finish four steps as follows.
- Step 1: Identify attributes for data mining, make training and testing datasets.
- Step 2: Implement a classification/prediction algorithm (Able to refer to the Weka
library to find the best algorithm)
- Step 3: Improve the results in Step 2 by clustering, other algorithms, or analysing
data.
- Step 4: Test the built models, compare and evaluate their performance. Write a report.
Hint: You should consider the run-time of building models and making predictions for
performance evaluation as well.
Your report should include introduction, body (description of processing steps and
evaluation), conclusions, and references.