CSCI946 Assignment_1_task_sheet
CSCI946 Assignment_1_task_sheet
Assignment 1
(10 marks)
Due: Check Moodle Site
Aim
This assignment aims to provide students with essen7al experience conduc7ng data analy7cs
experiments by using the programming language Python. ANer comple7ng this assignment, you
should know how to
• load and save data and workspace; and
as part of data analysis:
• analyze a problem and preprocess raw dataset,
• perform clustering,
• perform classifica7on, and
• discuss experiment results in an introductory way.
Group work: You are to work on this assignment as a group. Each group is to work
independently from other groups on this assignment. All group members are expected to
contribute to this assignment. Group members may use communica7ons tools (e.g., UOW
Zoom, UOW Webex, UOW Teams, Slack, Discord, WhatsApp, etc.) and online collabora7on
workspace (e.g., UOW OneDrive, Google Drive, GitHub, ZenHub, etc.) to complete the
assignment. Please plan before star7ng the assignment, then keep a detail digital work log and
7mesheet for each group member. A jus7fica7on and/or explana7ons must accompany all your
answers to this assignment. All group members are expected to work together and to contribute
to this assignment. One submission per group only.
Penal/es: If a group member fails to make a minimum contribu7on, the member will be
awarded zero marks. Claims of less or no contribu7on should provide evidence like a work log.
Plagiarism of any part in this assignment will result in zero marks being awarded to the whole
group.
Preliminaries
Read through the lecture slides, lab instruc7ons and the recommended readings in Weeks 1 – 4.
Conduct relevant background studies. You should use Python for the tasks in this assignment.
You can use any publicly accessible toolbox of library for Python. Your submission must include
the source code file(s) which, when run, would re-create all your results. Some of the
assignment tasks can be computa7onally expensive or memory expensive. You may require a
computer with sufficient compute power and memory (at least 16GB of memory in this subject).
To answer these two ques7ons, you need to think about the following parts. A figure to
illustrate your analy7cs plan is preferred.
1. Design your experiment (Task 1) and report: why would you choose all or part of data from
the NewChic dataset; how would you define “top 10” and “the best”; why some columns are
picked for clustering and classifica7on algorithms and some columns are for result discussion.
2. Program data preprocess (Task 1) by combining CSVs in one sheet and report: matched,
removed columns and detail explana7ons.
3. Use at least two clustering algorithms (Task 2) on preprocessed data and report: detail steps
of each algorithm, how you preprocessed the data, the result of all algorithms in a table,
algorithm comparison and best result.
4. Program at least two classifica/on algorithms (Task 3) on preprocessed data and report:
detail steps of each algorithm, result of all algorithms in a table, algorithm comparison and best
result.
5. Discuss results (Task 4) and report: the 10 best products, the best category and your
sugges7ons to NewChic.
Task 1 is expected to be answered in two sec7ons in your report, under sec7ons “Problem
Analysis” and “Data Preprocess”. Please accordingly cite referred ar7cles and programming
resources in your wri7ng. Task 1 also needs to submit the code. Add the code of data preprocess
to the ZIP file for your submission in which your code is saved in .py.
Submission:
The submission link for assignment 1 is on the subject’s Moodle site. Only one submission per
group. The submission must be a zip file named “A1.zip”, under 200 MB, and contains a
report (mandatory) and code (mandatory). Accepted files formats are a report in .pdf
format, and code files in .py.
Important:
• The report must be in a single file and in .pdf format. The 7tle page must list the full name
and student ID of all members in the group. Clearly indicate members who did not make a
minimum in contribu7ons.
• The report must answer the ques7ons in their order as given in the assignment specifica7on.
There is no page limit.
• The report must have a clear heading for each part of each task.
• Sufficient descrip7on, explana7on, jus7fica7on, and discussion are essen7al parts of your
answers. Marks will be deducted for incomplete or vague answers.
• Sufficient, suitable, and legible annota7on shall be provided in your code to make it easy to
understand. Marks will be deducted for un7dy code, code that is difficult to read, code that does
not run, or code that does not reproduce the results in your report.
Note: Failure of your code to run may aTract zero marks. Plagiarism of any part in your code, or
any part in your report will acract zero marks for this assignment. It is the responsibility of the
group to ensure that your submission does not contain plagiarized material. You may be
requested to demonstrate and explain your program or explain your answer in the report.
Marks are deducted if you are unable to offer an explana7on. Marks will be awarded for correct
design, implementa7on, style, completeness, and jus7fica7on.
---------------------------------- END-------------------------------------