0% found this document useful (0 votes)

8 views11 pages

DWM chp4 Solution

Notes

Uploaded by

pranav41494

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views11 pages

DWM chp4 Solution

Notes

Uploaded by

pranav41494

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter no.

4 – Introduction to Data Mining

(SOLUTION)

1. Define the term Data mining.

ANS:
 Data Mining is a process used by organizations to extract specific
data from huge databases to solve business problems. It primarily
turns raw data into useful information.
 The process of extracting information to identify patterns, trends, and
useful data that would allow the business to take the data-driven
decision from huge sets of data is called Data Mining or knowledge
discovery of database (KDD).
OR
 Data Mining is the process of investigating or searching hidden
patterns of information to various perspectives for categorization into
useful data.
OR
 Data mining is the act of automatically searching for large stores of
information to find trends and patterns that go beyond simple analysis
procedures.

2. Describe any four Challenges of Data mining.

ANS: Data mining system face a lot of challenges and issues in
today’s world. Some of them are mining methodology and user
interaction issues, performance issues, issues relating to the diversity
of database types etc.
1. Mining Methodology and User Interaction Issues:
 Mining different Kinds of Knowledge in Databases: Different
user-different knowledge-different way. That means different client
want a different kind of information so it becomes difficult to cover
vast range of data that can meet the client requirement.
 Interactive Mining of Knowledge at Multiple Levels of
Abstraction: Interactive mining allows users to focus the search for
patterns from different angles. The data mining process should be
interactive because it is difficult to know what can be discovered
within a database.
 Incorporation of Background Knowledge: Background knowledge
is used to guide discovery process and to express the discovered
patterns.

Dhrupesh Sir 9699692059 DWM

 Query Languages and Ad-hoc Mining: Relational query languages
(such as SQL) allow users to pose ad-hoc queries for data retrieval.
The language of data mining query language should be in perfectly
matched with the query language of data warehouse.
 Handling Noisy or Incomplete Data: In a large database, many of
the attribute values will be incorrect. This may be due to human error
or because of any instruments fail. Data cleaning methods and data
analysis methods are used to handle noise data.

2. Performance Issues:
 Efficiency and Scalability of Data Mining Algorithms: To
effectively extract information from a huge amount of data in databases,
data mining algorithms must be efficient and scalable.
 Parallel, Distributed and Incremental Mining Algorithms: The
huge size of many databases, the wide distribution of data, and
complexity of some data mining methods are factors motivating the
development of parallel and distributed data mining algorithms. Such
algorithms divide the data into partitions, which are processed in parallel.

3. Issues Related to the Diversity of Database Types:

 Handling of Relational and Complex Types of Data: There are
many kinds of data stored in databases and data warehouses. It is not
possible for one system to mine all these kind of [Link] different data
mining system should be construed for different kinds data.

 Mining Information from Heterogeneous Databases and

Global Information Systems: Since data is fetched from different
data sources on Local Area Network (LAN) and Wide Area Network
(WAN). The discovery of knowledge from different sources of
structured is a great challenge to data mining.

4. Accuracy of Data Issues:

 Data mining technique is not a 100 percent accurate. Most of the time
while collecting information about certain elements one used to seek
help from their clients, but nowadays everything has changed.

 And now the process of information collection made things easy with
the mining technology and their methods. One of the most possible
limitations of this data mining system is that it can provide accuracy
of data with its own limits.

Dhrupesh Sir 9699692059 DWM

3. Explain steps involved in KDD process with
diagram.
OR
4. Explain in detail Knowledge Discovery of Database
(KDD).
ANS: Knowledge Discovery of Database (KDD):
 Data mining is the process of discovering interesting patterns and
knowledge from large amounts of data.
 Data mining is used by companies in order to get customer
preferences, determine price of their product and services and to
analyse market.
 Data mining is also known as knowledge discovery in Database
(KDD).

Steps in the process of KDD:

Steps in KDD:
1. Data cleaning:
In data cleaning it removes the noise (error) and inconsistent data.
2. Data integration:
Multiple data sources may be combined in single unit.
3. Data selection:
The data relevant to the analysis task are retrieved from the database.
4. Data transformation:
The data are transformed and consolidated into forms appropriate for
mining by performing summary or aggregation operations. i.e. the data
from different data sources which is of varied types can be converted into
a single standard format.

Dhrupesh Sir 9699692059 DWM

5. Data mining:
Data mining is the process in which intelligent methods or algorithms are
applied on data to extract useful data patterns.
6. Pattern evaluation:
This process identifies the truly interesting patterns representing actual
knowledge based on user requirements for analysis.
7. Knowledge presentation:
In this process, visualization and knowledge representation techniques are
used to present mined knowledge to users for analysis.

5. Explain various data objects and attributes types.

ANS: Data Objects:
 Data sets are made up of data objects.
 A data object represents an entity.
 Example: in a sales database, the objects may be customers, store
items, and sales; in a medical database, the objects may be patients.
 Data objects are typically described by attributes.
 If the data objects are stored in a database, they are data tuples. That
is, the rows of a database correspond to the data objects, and the
columns correspond to the attributes.
Attribute:
 Attribute is a data field that represents characteristics or features of a
data object.
 For a customer object, attributes can be customer Id, address etc.
Set of attributes used to describe an object.
Types of attributes:
1. Qualitative Attributes
2. Quantitative Attributes

1. Qualitative Attributes:
a. Nominal Attributes (N):
 These attributes are related to names.

Dhrupesh Sir 9699692059 DWM

 The values of a Nominal attribute are name of things, some kind of
symbols. Values of Nominal attributes represents some category or
state and that’s why nominal attribute also referred as categorical
attributes and there is no order (rank, position) among values of
nominal attribute.
Example:

b. Binary Attributes (B):

 Binary data has only 2 values/states.
 Example: yes or no, affected or unaffected, true or false.
[Link]: Both values are equally important (Gender).
[Link]: Both values are not equally important (Result).
Example:

c. Ordinal Attributes (O):

The Ordinal Attributes contains values that have a meaningful sequence
or ranking(order) between them.
Example:

2. Quantitative Attributes:
a. Numeric:
A numeric attribute is quantitative because, it is a measurable quantity,
represented in integer or real values.
Example:

b. Discrete:
Discrete data have finite values, it can be numerical and can also be in
categorical form.
These attributes have finite or countably infinite set of values.
Example:

Dhrupesh Sir 9699692059 DWM

c. Continuous:
Continuous data have infinite no. of states. Continuous data is of float
type. There can be many values between 2 and 3.
Example:

6. Explain Data preprocessing technique in data

mining
OR
7. Explain major tasks in data preprocessing.
ANS: The major tasks in Data Preprocessing:
1. Data Cleaning.
2. Data Integration.
3. Data Transformation.
4. Data Reduction.
5. Data Discretization.

Data Cleaning
Real world data tend to be incomplete, noisy, and inconsistent. Data
cleaning (or data cleansing) routines attempt to fill in missing values,
smooth out noise while identifying outliers, and correct inconsistencies in
the data.
a. Handling Missing Values.
b. Cleaning of noisy data.

Data Integration

Dhrupesh Sir 9699692059 DWM

 Data integration is one of the steps of data pre-processing that
involves combining data residing in different sources and providing
users with a unified view of these data.
 It merges the data from multiple data stores (data sources). It includes
multiple databases, cubes or flat files.
 There are mainly two major approaches for data integration
commonly known as "tight coupling approach" and "loose coupling
approach".

Data Transformation
 In data mining pre-processes and especially in metadata and data
warehouse, we use data transformation in order to convert data from a
source data format into destination data.
 - 2, 32, 100, 59,48 -0.02, 0.32, 1.00, 0.59, 0.48
Here, the data are transformed or consolidated into forms appropriate for
mining. Data Transformation operations would contribute toward the
success of the mining process.

Data Reduction
 A database or date warehouse may store terabytes of data. So it may
take very long to perform data analysis and mining on such huge
amounts of data.
 Data reduction techniques can be applied to obtain a reduced
representation of the data set that is much smaller in volume, yet
closely maintains the integrity of the original data.
 That is, mining on the reduced data set should be more efficient yet
produce the same (or almost the same) analytical results.

Data Discretization
 Discretization and concept hierarchy generation are powerful tools for
data mining, in that they allow the mining of data at multiple levels of
abstraction.
 Data discretization and concept hierarchy generation are also forms of
data reduction. The raw data are replaced by a smaller number of
interval or concept labels. This simplifies the original data and makes
the mining more efficient.
 The resulting patterns mined are typically easier to understand.
Concept hierarchies are also useful for mining at multiple abstraction
levels.
 Discretization, where the raw values of a numeric attribute (e.g., age)
are replaced by interval labels (e.g.. 0-10, 11-20, etc.) or conceptual
labels (e.g., youth, adult, senior). The labels, in turn, can be

Dhrupesh Sir 9699692059 DWM

recursively organized into higher-level concepts, resulting in aconcept
hierarchy for the numeric attribute.

8. Describe the need of data preprocessing.

ANS: Data preprocessing in data mining is the key step to identifying the
missing key values, inconsistencies, and noise, containing errors and
outliers. Without data preprocessing in data science, these data errors
would survive and lower the quality of data mining. It improves
accuracy and reliability. Preprocessing data removes missing or
inconsistent data values resulting from human or computer error, which
can improve the accuracy and quality of a dataset, making it more reliable.

9. List methods of data preprocessing.

ANS: Methods of Data preprocessing are:

Dhrupesh Sir 9699692059 DWM

[Link] any two data cleaning methods.
ANS: Data Cleaning:
 Real-world data tend to be incomplete, noisy, and inconsistent.
 Data cleaning (or data cleansing) routines attempt to fill in missing
values, smooth out noise while identifying outliers, and correct
inconsistencies in the data.
Method 1: Handling Missing Values:
1. Ignore the tuple
2. Fill in the missing value manually
3. Use a global constant to fill in the missing value Replace all
missing attribute values by the same constant such as a label like
“Unknown” or
4. Use a measure of central tendency for the attribute (e.g., the mean
or median) to fill the missing value
5. Use the attribute mean or median for all samples belonging to the
same class as the given tuple:
6. Use the most probable value to fill in the missing value:

Dhrupesh Sir 9699692059 DWM

Method 2: Handling Noisy Data:
1. Binning Method:
This method works on sorted data in order to smooth it. The whole
data is divided into segments of equal size and then various methods
are performed to complete the task. Each segmented is handled
separately. One can replace all data in a segment by its mean or
boundary values can be used to complete the task.
2. Regression:
 Data smoothing can also be done by regression, a technique that
conforms data values to a function.
 Linear regression involves finding the “best” line to fit two attributes
(or variables) so that one attribute can be used to predict the other.
 Multiple linear regression is an extension of linear regression, where
more than two attributes are involved and the data are fit to a
multidimensional surface.
3. Clustering and Outlier analysis:
Clustering groups the similar data in a cluster. Outliers may be
detected by clustering, for example, where similar values are
organized into groups, or “clusters.” Intuitively, values that fall
outside of the set of clusters may be considered outliers.

[Link] Data cleaning.

ANS: Data Cleaning
Real world data tend to be incomplete, noisy, and inconsistent. Data
cleaning (or data cleansing) routines attempt to fill in missing values,
smooth out noise while identifying outliers, and correct inconsistencies in
the data.
a. Handling Missing Values.
b. Cleaning of noisy data.

Fig: Data Cleaning

[Link] Data Cleaning Process

Dhrupesh Sir 9699692059 DWM

ANS: Data Cleaning as a Process

 Upto now, under data cleaning we have seen different techniques for
handling missing data and for smoothing data. Missing values, noise,
outliers and inconsistencies all these contribute to inaccurate data.
 An outlier is a data object that deviates significantly from the rest of
the objects, as if it were generated by a different mechanism.
 The first step in data cleaning as a process is discrepancy detection.
Discrepancies can be caused by several factors,
1. Poor data entry by human beings or certain data may not be
considered important at the time of entry.
2. Inconsistent with other recorded data or inconsistencies due to data
integration.
3. Data not entered due to misunderstanding.
4. Not register history or changes of the data.
5. Errors in instrumentation devices that record data and system errors.
6. Errors can also occur when the data are (inadequately) used for
purposes other than originally intended.
7. Experimental errors (data extraction or experiment
planning/executing errors).

Dhrupesh Sir 9699692059 DWM

Data Mining
No ratings yet
Data Mining
15 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Whats App
No ratings yet
Whats App
23 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
Data Mining Unit-I
No ratings yet
Data Mining Unit-I
5 pages
DM Module1
No ratings yet
DM Module1
15 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
Data Mining & Warehousing Guide
No ratings yet
Data Mining & Warehousing Guide
17 pages
Data Mining
No ratings yet
Data Mining
22 pages
KDD and Data Mining Explained
No ratings yet
KDD and Data Mining Explained
46 pages
Data Mining Notes UNIT I
No ratings yet
Data Mining Notes UNIT I
21 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
44 pages
DMDW Unit1
No ratings yet
DMDW Unit1
31 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
89 pages
DM Lesson3
No ratings yet
DM Lesson3
14 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
42 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
DM Notes (6th Nov)
No ratings yet
DM Notes (6th Nov)
6 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
Data Mining
No ratings yet
Data Mining
20 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Lesson 1
No ratings yet
Lesson 1
32 pages
DM Passing Package
No ratings yet
DM Passing Package
38 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Data Mining Overview and Techniques
No ratings yet
Data Mining Overview and Techniques
12 pages
Data Mining
No ratings yet
Data Mining
26 pages
Cap481 - Business Communication Unit 4
No ratings yet
Cap481 - Business Communication Unit 4
90 pages
FDS - I Unit
No ratings yet
FDS - I Unit
9 pages
DWDM Assignment - 2 (Sahil)
No ratings yet
DWDM Assignment - 2 (Sahil)
18 pages
DWM 4
No ratings yet
DWM 4
23 pages
Data Minng
No ratings yet
Data Minng
20 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Data Mining: Issues and Motivations
No ratings yet
Data Mining: Issues and Motivations
23 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
Unit 1
No ratings yet
Unit 1
11 pages
DM&DW SEE Module 1
No ratings yet
DM&DW SEE Module 1
6 pages
Data Warehousing & Mining Course Overview
No ratings yet
Data Warehousing & Mining Course Overview
118 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
33 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
DM Notes
No ratings yet
DM Notes
26 pages
Module 1
No ratings yet
Module 1
41 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Unit III
No ratings yet
Unit III
101 pages
Types of Attributes-1
No ratings yet
Types of Attributes-1
8 pages
Data Mining: Tasks, Models, and Issues
No ratings yet
Data Mining: Tasks, Models, and Issues
19 pages
CS-DM Module - 1
No ratings yet
CS-DM Module - 1
27 pages
Unit 3 DW&DM Notes Mr. Rohit Pratap Singh
No ratings yet
Unit 3 DW&DM Notes Mr. Rohit Pratap Singh
22 pages
Assignment of DMDW kg11
No ratings yet
Assignment of DMDW kg11
17 pages
Adm Unit - 1
No ratings yet
Adm Unit - 1
62 pages
Knowledge Discovery in Databases Explained
No ratings yet
Knowledge Discovery in Databases Explained
8 pages
DMDW Assignment
No ratings yet
DMDW Assignment
20 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
Data Mining Essentials for IT Students
No ratings yet
Data Mining Essentials for IT Students
50 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
30 pages
Eec Answers
No ratings yet
Eec Answers
8 pages
WPC Exp 19 To 22
No ratings yet
WPC Exp 19 To 22
18 pages
Sen Su 2023
No ratings yet
Sen Su 2023
27 pages
DWM Extra
No ratings yet
DWM Extra
7 pages
Leon Fray Resume June
No ratings yet
Leon Fray Resume June
1 page
Christopher Morgan: Contact Professional Summary
100% (1)
Christopher Morgan: Contact Professional Summary
1 page
SQL Developer Release Notes 24.3.1
No ratings yet
SQL Developer Release Notes 24.3.1
2 pages
5 normalizationDBMS
No ratings yet
5 normalizationDBMS
12 pages
56691
12% (17)
56691
5 pages
HLD Document
No ratings yet
HLD Document
10 pages
Ransaction Processing
No ratings yet
Ransaction Processing
3 pages
19 SOP WP3 19 Data Backup and Disaster Recovery v2.0 en 22JUN2022
No ratings yet
19 SOP WP3 19 Data Backup and Disaster Recovery v2.0 en 22JUN2022
3 pages
C - ABAPD - 2507 SAP Real Updated Questions
No ratings yet
C - ABAPD - 2507 SAP Real Updated Questions
8 pages
Hotel Management System
No ratings yet
Hotel Management System
7 pages
Exploring Microsoft Office 2019 Introductory 1st Edition by Mary Anne Poatsy (Ebook PDF) PDF Download
83% (6)
Exploring Microsoft Office 2019 Introductory 1st Edition by Mary Anne Poatsy (Ebook PDF) PDF Download
59 pages
Hiim Domains
No ratings yet
Hiim Domains
4 pages
K1702
No ratings yet
K1702
88 pages
KV - Bank Management System
No ratings yet
KV - Bank Management System
36 pages
Database Processing-11 Edition: David M. Kroenke and David J. Auer
No ratings yet
Database Processing-11 Edition: David M. Kroenke and David J. Auer
37 pages
Class 12 - Cs - I Preboard - 2024-25 - Set-A
No ratings yet
Class 12 - Cs - I Preboard - 2024-25 - Set-A
10 pages
Restore Lost Galaxy
No ratings yet
Restore Lost Galaxy
4 pages
System Design Showroom With Diagrams
No ratings yet
System Design Showroom With Diagrams
4 pages
Creating An Odoo Development Module Involves Several Steps
No ratings yet
Creating An Odoo Development Module Involves Several Steps
7 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Advanced Database Concepts Course
100% (1)
Advanced Database Concepts Course
2 pages
Prayer Times App Documentation
No ratings yet
Prayer Times App Documentation
6 pages
DBMS-lecture 4 ER-Model Introduction
No ratings yet
DBMS-lecture 4 ER-Model Introduction
14 pages
Efficient Database File Organization
No ratings yet
Efficient Database File Organization
26 pages
Ijrar Issue 20542542
No ratings yet
Ijrar Issue 20542542
3 pages
RMAN Backup and Recovery Commands Guide
No ratings yet
RMAN Backup and Recovery Commands Guide
3 pages
30 FSD Projects With Detailed Description of All Problem Statements
No ratings yet
30 FSD Projects With Detailed Description of All Problem Statements
30 pages
Power Bi
No ratings yet
Power Bi
12 pages
Rdbms III Sem
100% (1)
Rdbms III Sem
80 pages
MIS Structure: Components & Functions
No ratings yet
MIS Structure: Components & Functions
17 pages

DWM chp4 Solution

Uploaded by

DWM chp4 Solution

Uploaded by

Chapter no.

4 – Introduction to Data Mining

1. Define the term Data mining.

2. Describe any four Challenges of Data mining.

Dhrupesh Sir 9699692059 DWM

3. Issues Related to the Diversity of Database Types:

 Mining Information from Heterogeneous Databases and

4. Accuracy of Data Issues:

Dhrupesh Sir 9699692059 DWM

Steps in the process of KDD:

Dhrupesh Sir 9699692059 DWM

5. Explain various data objects and attributes types.

Dhrupesh Sir 9699692059 DWM

b. Binary Attributes (B):

c. Ordinal Attributes (O):

Dhrupesh Sir 9699692059 DWM

6. Explain Data preprocessing technique in data

Dhrupesh Sir 9699692059 DWM

Dhrupesh Sir 9699692059 DWM

8. Describe the need of data preprocessing.

9. List methods of data preprocessing.

Dhrupesh Sir 9699692059 DWM

Dhrupesh Sir 9699692059 DWM

[Link] Data cleaning.

Fig: Data Cleaning

[Link] Data Cleaning Process

Dhrupesh Sir 9699692059 DWM

Dhrupesh Sir 9699692059 DWM

You might also like