DM Practicals in Python

The document contains instructions for 6 questions involving data analysis tasks such as reading data, applying rules to check conditions, outlier detection, frequent itemset mining, classification algorithms, and clustering. For each question, specific datasets and algorithms are to be used to analyze the data, identify patterns, and compare results.

Uploaded by

Akansha Sharma

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

18 views55 pages

DM Practicals in Python

Uploaded by

Akansha Sharma

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Download as docx, pdf, or txt

You are on page 1/ 55

Q1. Create a file “people.

txt” with the following data:

i) Read the data from the file “people.txt”.

ii) Create a ruleset E that contain rules to check for the following conditions:
1. The age should be in the range 0-150.
2. The age should be greater than years married.
3. The status should be married or single or widowed.
4. If age is less than 18 the age group should be child, if age is between 18 and 65 the age
group should be adult, if age is more than 65 the age group should be elderly.

iii) Check whether ruleset E is violated by the data in the file people.txt.
iv) Summarize the results obtained in part (iii).
v) Visualize the results obtained in part (iii)

Q2. Perform the following preprocessing tasks on the dirty_iris dataset.

i) Calculate the number and percentage of observations that are complete.
ii) Replace all the special values in data with NA.

iii) Define these rules in a separate text file and read them. (Use editfile function in R
(package editrules). Use similar function in Python).
Print the resulting constraint object.
– Species should be one of the following values: setosa, versicolor or virginica.
– All measured numerical properties of an iris should be positive.
– The petal length of an iris is at least 2 times its petal width.
– The sepal length of an iris cannot exceed 30 cm.
– The sepals of an iris are longer than its petals.
iv) Determine how often each rule is broken (violatedEdits). Also summarize and plot the
result.
v) Find outliers in sepal length using boxplot and boxplot.stats

Q3. Load the data from wine dataset. Check whether all attributes are
standardized or not (mean is 0 and standard deviation is 1). If not,
standardize the attributes. Do the same with Iris dataset.
For Iris Dataset:
For Wine Dataset
Q4. Run Apriori algorithm to find frequent itemsets and association rules
4.1 Use minimum support as 2% and minimum confidence as 5%
4.2 Use minimum support as 10% and minimum confidence as 20 %
Another Dataset
Q5. Use Naive bayes, K-nearest, and Decision tree classification algorithms
and build classifiers. Divide the data set into training and test set. Compare
the accuracy of the different classifiers under the following situations:

5.1 a) Training set = 75% Test set = 25%

b) Training set = 66.6% (2/3rd of total), Test set = 33.3%
5.2 Training set is chosen by
i) hold out method
ii) Random subsampling
iii) Cross-Validation. Compare the accuracy of the classifiers obtained.
5.3 Data is scaled to standard format.

For Iris Dataset

For Breast Cancer Dataset:
Holdout Method
Decision Tree (On Iris dataset).
Decision Tree (On Breast Cancer Dataset)
KNN (On Iris Dataset)
KNN (On Breast Cancer Dataset)
Naïve Bayes (On Iris Dataset)
Naïve Bayes (On Breast Cancer Dataset)

Random Subsampling
Decision Tree (On Iris Dataset)
Decision Tree (Breast Cancer Dataset)

KNN (Iris Dataset)

KNN (Breast Cancer Dataset)

Naïve Bayes (On Iris Dataset)

Naïve Bayes (On Breast Cancer Dataset)

Training set = 66.6% (2/3rd of total), Test set = 33.3%

Holdout Method
Decision Tree (On Iris Dataset)
Decision Tree (On Breast Cancer Dataset)

KNN (On Iris Dataset)

KNN (On Breast Cancer Dataset)
Naïve Bayes (On Iris Dataset)

Naïve Bayes (On Breast Cancer Dataset)

Random Subsampling
Decision Tree (On Iris dataset)
Decision Tree (On Breast Cancer dataset)

KNN (On Iris Dataset)

KNN (On Breast Cancer Dataset)

Naïve Bayes (On Iris Dataset)

Naïve Bayes (On Breast Cancer Dataset)

Cross Validation

Decision Tree (On Iris Dataset)

Decision Tree (On Breast Cancer Dataset)

KNN (On Iris and Breast Cancer Dataset)

Naïve bayes (On Iris and Breast Cancer Dataset)

Result:
Q6. Use Simple Kmeans, DBScan, Hierachical clustering algorithms for clustering. Compare the
performance of clusters by changing the parameters involved in the algorithms.
Kmeans Clustering

DBScan
Hierarchical Clustering

Practice Exam
No ratings yet
Practice Exam
6 pages
Data Mining Case
No ratings yet
Data Mining Case
8 pages
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
No ratings yet
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
3 pages
Mining Multilevel Association Rules From Transactional Databases
No ratings yet
Mining Multilevel Association Rules From Transactional Databases
46 pages
Big Basket - Solution PDF
100% (1)
Big Basket - Solution PDF
8 pages
21033570029_dm file kashish
No ratings yet
21033570029_dm file kashish
40 pages
Datamining 2
No ratings yet
Datamining 2
54 pages
Final Practical
No ratings yet
Final Practical
53 pages
Vivek Sharma 2k21 Cs 111
No ratings yet
Vivek Sharma 2k21 Cs 111
48 pages
1
No ratings yet
1
19 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
21 pages
Income Prediction
No ratings yet
Income Prediction
19 pages
ML_Report
No ratings yet
ML_Report
15 pages
Lab
No ratings yet
Lab
5 pages
DA_Lab_Week-3 (1)
No ratings yet
DA_Lab_Week-3 (1)
15 pages
new90李美行管理科学与工程 202111200082
No ratings yet
new90李美行管理科学与工程 202111200082
14 pages
Manisha 3001 Week 12
No ratings yet
Manisha 3001 Week 12
22 pages
DMBI Sample Questions
No ratings yet
DMBI Sample Questions
7 pages
Rule Acquisition in Data Mining Using Genetic Algorithm
No ratings yet
Rule Acquisition in Data Mining Using Genetic Algorithm
9 pages
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
No ratings yet
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
4 pages
Research
No ratings yet
Research
12 pages
Clustering: " Are There Clusters of Similar Cells?"
No ratings yet
Clustering: " Are There Clusters of Similar Cells?"
24 pages
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
No ratings yet
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
5 pages
STAT8017 Assignment 1
No ratings yet
STAT8017 Assignment 1
6 pages
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
No ratings yet
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
4 pages
DC Meet Second
No ratings yet
DC Meet Second
21 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
Basic Stat in SAS
No ratings yet
Basic Stat in SAS
12 pages
DMDW 4th Module
No ratings yet
DMDW 4th Module
50 pages
BMW M-4
No ratings yet
BMW M-4
108 pages
Design of Experiments For The NIPS 2003 Variable Selection Benchmark
No ratings yet
Design of Experiments For The NIPS 2003 Variable Selection Benchmark
30 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
Framework For Comparison of Association Rule Mining Using Genetic Algorithm
No ratings yet
Framework For Comparison of Association Rule Mining Using Genetic Algorithm
8 pages
1. Using Excel, SPSS or any other software package of your choice, analyze the data in problem 3.40 on page 91. Have the computer generate descriptive statistics and a histogram on the age. The descriptiv
No ratings yet
1. Using Excel, SPSS or any other software package of your choice, analyze the data in problem 3.40 on page 91. Have the computer generate descriptive statistics and a histogram on the age. The descriptiv
15 pages
Unit 3
No ratings yet
Unit 3
38 pages
Association Models For Prediction With Apriori Concept
No ratings yet
Association Models For Prediction With Apriori Concept
7 pages
Article - Calculation For Chi-Square Tests
No ratings yet
Article - Calculation For Chi-Square Tests
3 pages
The Data Cortex Nuclear Data Set
No ratings yet
The Data Cortex Nuclear Data Set
1 page
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Lab Manual Computer Science & Engineering
No ratings yet
Lab Manual Computer Science & Engineering
29 pages
Weka Project1 Sajeena
No ratings yet
Weka Project1 Sajeena
14 pages
C45 Algorithm
No ratings yet
C45 Algorithm
12 pages
Module 4
No ratings yet
Module 4
41 pages
Data M Ining Classifi Cation: Fabricio Voznika Leo Nardo Via Na
No ratings yet
Data M Ining Classifi Cation: Fabricio Voznika Leo Nardo Via Na
6 pages
Weka Sample
No ratings yet
Weka Sample
21 pages
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
No ratings yet
Mid Term Assignment Data Warehousing and Data Mining Section: C Name: Joy, MD - Monowar Hossain ID: 18-38618-2
3 pages
TwoStep Cluster Analysis
No ratings yet
TwoStep Cluster Analysis
19 pages
DM Assignments
No ratings yet
DM Assignments
4 pages
15 Chapter6 PDF
No ratings yet
15 Chapter6 PDF
12 pages
Assignment-7: Opening Iris - Arff and Removing Class Attribute
No ratings yet
Assignment-7: Opening Iris - Arff and Removing Class Attribute
17 pages
AP Questions Chapter 4
No ratings yet
AP Questions Chapter 4
8 pages
4227 GUI Ebook Data Science Interview Guide
No ratings yet
4227 GUI Ebook Data Science Interview Guide
25 pages
Pset2 Question
No ratings yet
Pset2 Question
5 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
DM QB
No ratings yet
DM QB
3 pages
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
No ratings yet
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
4 pages
Online Course Assignments
No ratings yet
Online Course Assignments
8 pages
Dwdm Answer
No ratings yet
Dwdm Answer
19 pages
Principal Components Analysis: Contents at A Glance
No ratings yet
Principal Components Analysis: Contents at A Glance
17 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
From Everand
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
Phillip I. Good
No ratings yet
STATES dANCES
No ratings yet
STATES dANCES
8 pages
CH 2 1
No ratings yet
CH 2 1
49 pages
Machine Learning OBE Question Paper 2020
No ratings yet
Machine Learning OBE Question Paper 2020
3 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
It 5
No ratings yet
It 5
22 pages
Data Mining: July 18, 2019 1
No ratings yet
Data Mining: July 18, 2019 1
41 pages
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
Apriory Algo
No ratings yet
Apriory Algo
21 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
4 pages
BCA DM Chapter 6 - Association
No ratings yet
BCA DM Chapter 6 - Association
37 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
Association Rule Mining - Apriori Algorithm
No ratings yet
Association Rule Mining - Apriori Algorithm
22 pages
2023 - Principles and Theories of Data Mining With RapidMiner
No ratings yet
2023 - Principles and Theories of Data Mining With RapidMiner
326 pages
It 6001 Da 2 Marks With Answer PDF
No ratings yet
It 6001 Da 2 Marks With Answer PDF
10 pages
2010 Book InnovationsAndAdvancesInComput PDF
No ratings yet
2010 Book InnovationsAndAdvancesInComput PDF
569 pages
Association Rules v2
No ratings yet
Association Rules v2
9 pages
DM Question Bank Mid 1
No ratings yet
DM Question Bank Mid 1
2 pages
knime-press-practicing-data-science-4.7-plain
No ratings yet
knime-press-practicing-data-science-4.7-plain
158 pages
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
23 pages
A Review of Network Traffic Analysis and Prediction Techniques
No ratings yet
A Review of Network Traffic Analysis and Prediction Techniques
22 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
DWDM Lab Manual: Department of Computer Science and Engineering
No ratings yet
DWDM Lab Manual: Department of Computer Science and Engineering
46 pages
DWDM MCQ Qns 2020
No ratings yet
DWDM MCQ Qns 2020
5 pages
IS414: Data Mining: DR - Waleed M.Ead
No ratings yet
IS414: Data Mining: DR - Waleed M.Ead
36 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Association Rule Mining-Hands - On
No ratings yet
Association Rule Mining-Hands - On
2 pages
Association Rule Mining
No ratings yet
Association Rule Mining
8 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
Computer Networks and Information Security
No ratings yet
Computer Networks and Information Security
35 pages
Analytical Case Study of Casino and Resort: Wei Lin, PH.D William Schmarzo
No ratings yet
Analytical Case Study of Casino and Resort: Wei Lin, PH.D William Schmarzo
40 pages
Data Mining Homework Assignment #2: Dmytro Fishman, Anna Leontjeva and Jaak Vilo February 25, 2014
0% (3)
Data Mining Homework Assignment #2: Dmytro Fishman, Anna Leontjeva and Jaak Vilo February 25, 2014
3 pages