Lab 2
Lab 2
1.1 Description
In this assignment, you are going to build a decision tree on the UCI Nursery Data Set, with support from the
scikit-learn library
There are 12960 records in the data set. Nursery Database was derived from a hierarchical decision model originally
developed to rank applications for nursery schools.
You can download the dataset here: https://archive.ics.uci.edu/dataset/76/nursery
This task prepares the training sets and test sets for the incoming experiments. Add the ”.csv” extension to the file
”nursery.data” to read the file
You prepare the following four subsets from the data in ”nursery.data.csv”:
• feature_train: a set of training examples, each of which is a tuple of 8 attribute values (target attribute
excluded).
You need to shuffle the data before splitting and split it in a stratified fashion. Other parameters (if there are any)
are left by default.
There will be experiments on training sets and test sets of different proportions, including (train/test) 40/60, 60/40,
80/20, and 90/10; thus, you need 16 subsets.
Visualize the distributions of classes in all the data sets (the original set, training set, and test set) of all proportions
to show that you have prepared them appropriately
This task conducts experiments on the designated train/test proportions listed above.
You need to fit an instance of sklearn.tree.DecisionTreeClassifier (with information gain) to each training set and
visualize the resulting decision tree using graphviz.
Artificial Intelligence
For each of the above decision tree classifiers, predict the examples in the corresponding test set, and make a report
using classification_report and confusion_matrix.
How do you interpret the classification report and the confusion matrix? From that, make your comments on the
performances of those decision tree classifiers.
This task works on the 80/20 training set and test set. You need to consider how the decision tree’s depth affects the
classification accuracy.
You can specify the maximum depth of a decision tree by varying the parameter max_depth of the decision tree.
You need to try the following values for parameter max_depth: None, 2, 3, 4, 5, 6, 7. And then,
• Provide the decision tree drawn by graphviz for each max_depth value
• Report to the following table the accuracy_score (on the test set) of the decision tree classifier when changing
the value of parameter max_depth.
max_depth None 2 3 4 5 6 7
Accuracy
2 Grading
3 Notice
• This is an INDIVIDUAL assignment.
• You must use Python language and present the code in a single ipynb file.
• All the required visualizations must be presented in the ipynb file, while statistical results and comments are
presented in the report.