Alexandria University
Faculty of Computers and Science
Course Title: Introduction to Data Science
Term: Fall 2024 - 2025
Final Project
The main goal for this project is to guide you to practice and apply what you have
learned in a real-world task.
Rules:
1. Cheating is strictly prohibited, and if a match is found between any two groups, strict
measures will be taken with both parties.
2. Each group should consist of 2 students as minimum to 6 students as maximum.
➢ The group will be assigned to a certain number that will be announced on teams.
➢ Discussion of the projects will be announced soon.
3. Each group must prepare a pdf report called “Project Report + Group No.", this report
must contain the following:
o Students’ names, IDs, and Group number.
o Explain the problem and briefly describe the role of each member. Note that: the
problem description must answer the following questions:
a. What will the program do?
b. What the input to the program will be?
c. What the output from the program will be?
o The full description of your dataset.
o Screenshots from your Project steps.
o Explain your results and insight by describing your plotted graphs.
o Discussing every part in the Code (libraries used + attributes)
✓ (Screenshot for code parts + Describing what it does)
Alexandria University
Faculty of Computers and Science
Course Title: Introduction to Data Science
Term: Fall 2024 - 2025
Project Details:
We have attached Grocery (GRC) dataset that you can download from the project file.
Using this dataset, you are asked to design your software that includes the following
items:
1. A user interface that allows customers to enter the required data and see results through out.
2. Use (R) to do the following tasks:
a. Assess and clean your data if needed.
b. Use a different type of Data Visualization tools for each of the following:
i. Compare cash and credit totals.
ii. Compare each age and sum of total spending.
iii. Show each city total spending and arrange it by total descending.
iv. Display the distribution of total spending.
c. Put all previous plots in one dashboard.
d. Split the customers to (n) groups -using one of the studied methods- according to:
i. The sum of total spending for each customer.
ii. And the age for each customer.
iii. Print a table displaying each customer name, age, total spending and the
computed cluster number.
e. Generate association rules between items with minimum support and confidence
taken from the user as inputs (State the algorithm used).
❖ Also, you can use the following guidelines to help you with implementing your Program:
❖ Program user inputs
Variable name label Notes Validation
Dataset_Path Dataset path User should input the Required
full path of the data file.
Number_Of_Clusters Numbers of clusters To use in the clustering Number between 2 and 4
process.
Min_Support Minimum Apriori To use in the Apriori Number between 0.001
support Algorithm. and 1
Min_Confidence Minimum Apriori To use in the Apriori Number between 0.001
confidence Algorithm. and 1
Alexandria University
Faculty of Computers and Science
Course Title: Introduction to Data Science
Term: Fall 2024 - 2025
Submission Details:
❖ You should submit the following on a form according to instructions:
✓ R version of the code.
✓ Project Report with Group number as a pdf version.
❖ You can use the following links to guide you with building R Interface
o [Link]
o [Link]
o [Link]
Good Luck☺