Lab
Lab
Lab
Decision Tree
1 Description
In this assignment, you are going to build decision trees on real world datasets using scikit-learn.
• Multi-class dataset: The UCI Wine Quality dataset is used for classifying wine samples
into quality levels based on physicochemical properties such as acidity, alcohol content, etc.
This dataset includes 4898 samples, with labels from 0 (low quality) to 10 (high quality).
Please visit the link below for the dataset:
https://archive.ics.uci.edu/dataset/186/wine+quality
• Additional dataset: You have to find another dataset and build the decision tree for it.
Please provide a detailed description of the dataset information in your report.
Your dataset must:
2 Specifications
You are required to write Python Notebooks (.ipynb) and use scikit-learn library to complete
the following tasks described for the Breast Cancer dataset.
While there are no strict guidelines for code organization, each task must be clearly documented
and fully comply with all specified requirements.
You need to shuffle the dataset before splitting and ensure it is split in a stratified fashion.
Other parameters (if there are any) should remain at their default settings.
There will be experiments on training and test sets with different proportions, including 40/60,
60/40, 80/20, and 90/10 (train/test); therefore, you will need 16 subsets in total.
Visualize the class distributions in all datasets (the original set, training sets, and test sets)
across all proportions to demonstrate that they have been appropriately prepared.
How do you interpret the classification report and the confusion matrix? Based on the results,
provide your insights into the performance of these decision tree classifiers.
• Provide the decision trees, visualized using Graphviz, for each max_depth value.
• Report the accuracy_score (on the test set) of the decision tree classifier for each value of
the max_depth parameter in the following table.
max_depth None 2 3 4 5 6 7
Accuracy
3 Requirements
3.1 Report
The report must include the following sections:
• Member information (Student ID, full name, etc.).
• A work assignment table, which includes information on each task assigned to team members,
along with the completion rate of each member relative to the assigned tasks.
• All visualizations must be presented in the .ipynb file, while statistical results and insights
must be presented in the report.
• The report needs to be well-formatted and exported to PDF. If there are figures cut off by
the page break, etc., points will be deducted.
3.2 Submission
• All reports, code, etc., must be contributed in the form of a compressed file (.zip, .rar, .7z)
and named according to the format: StudentID1_StudentID2_etc.zip/.rar/.7z.
• If the compressed file is larger than 25MB, prioritize compressing the report and source code.
Images and other large files may be uploaded to the Google Drive and shared via a link.
4 Assessment
The detailed assessment criteria for this Lab are outlined as follows:
The detailed assessment criteria for each dataset are outlined as follows:
5 Notices
Please pay attention to the following notices:
• Any plagiarism, any tricks, or any lie will have a 0 point for the course grade.
The end.