DM Practicals in Python
DM Practicals in Python
iii) Check whether ruleset E is violated by the data in the file people.txt.
iv) Summarize the results obtained in part (iii).
v) Visualize the results obtained in part (iii)
iii) Define these rules in a separate text file and read them. (Use editfile function in R
(package editrules). Use similar function in Python).
Print the resulting constraint object.
– Species should be one of the following values: setosa, versicolor or virginica.
– All measured numerical properties of an iris should be positive.
– The petal length of an iris is at least 2 times its petal width.
– The sepal length of an iris cannot exceed 30 cm.
– The sepals of an iris are longer than its petals.
iv) Determine how often each rule is broken (violatedEdits). Also summarize and plot the
result.
v) Find outliers in sepal length using boxplot and boxplot.stats
Q3. Load the data from wine dataset. Check whether all attributes are
standardized or not (mean is 0 and standard deviation is 1). If not,
standardize the attributes. Do the same with Iris dataset.
For Iris Dataset:
For Wine Dataset
Q4. Run Apriori algorithm to find frequent itemsets and association rules
4.1 Use minimum support as 2% and minimum confidence as 5%
4.2 Use minimum support as 10% and minimum confidence as 20 %
Another Dataset
Q5. Use Naive bayes, K-nearest, and Decision tree classification algorithms
and build classifiers. Divide the data set into training and test set. Compare
the accuracy of the different classifiers under the following situations:
Random Subsampling
Decision Tree (On Iris Dataset)
Decision Tree (Breast Cancer Dataset)
Holdout Method
Decision Tree (On Iris Dataset)
Decision Tree (On Breast Cancer Dataset)
Cross Validation
DBScan
Hierarchical Clustering