Data Warehouse and Data Mining – Fourth Lecture
2.6 Classification of Data Mining Systems
The data mining system can be classified according to the following
criteria:
Database Technology
Statistics
Machine Learning
Information Science
Visualization
Other Disciplines
Some Other Classification Criteria:
Classification according to kind of databases mined
Classification according to kind of knowledge mined
Classification according to kinds of techniques utilized
Classification according to applications adapted
Classification according to kind of databases mined
We can classify the data mining system according to kind of databases
mined. Database system can be classified according to different criteria such
as data models, types of data etc. And the data mining system can be
classified accordingly. For example, if we classify the database according to
Data Warehouse and Data Mining 2023-2024
data model then we may have a relational, transactional, object- relational,
or data warehouse mining system.
Classification according to kind of knowledge mined
We can classify the data mining system according to kind of knowledge
mined. It is means data mining system are classified on the basis of
functionalities such as:
Characterization
Discrimination
Association and Correlation Analysis
Classification
Prediction
Clustering
Outlier Analysis
Evolution Analysis
Classification according to kinds of techniques utilized
We can classify the data mining system according to kind of techniques
used. We can describe these techniques according to degree of user
interaction involved or the methods of analysis employed.
Prepared by Dr. Dunia H. Hameed Page 35
Data Warehouse and Data Mining 2023-2024
Classification according to applications adapted
We can classify the data mining system according to application adapted.
These applications are as follows:
Finance
Telecommunications
DNA
Stock Markets
E-mail
2.7 Major Issues in Data Mining
Mining different kinds of knowledge in databases. - The need of
different users is not the same. And Different user may be in
interested in different kind of knowledge. Therefore, it is necessary
for data mining to cover broad range of knowledge discovery task.
Interactive mining of knowledge at multiple levels of abstraction. -
The data mining process needs to be interactive because it allows
users to focus the search for patterns, providing and refining data
mining requests based on returned results.
Incorporation of background knowledge. - To guide discovery
process and to express the discovered patterns, the background
knowledge can be used. Background knowledge may be used to
express the discovered patterns not only in concise terms but at
multiple level of abstraction.
Prepared by Dr. Dunia H. Hameed Page 36
Data Warehouse and Data Mining 2023-2024
Data mining query languages and ad hoc data mining. - Data
Mining Query language that allows the user to describe ad hoc mining
tasks, should be integrated with a data warehouse query language and
optimized for efficient and flexible data mining.
Presentation and visualization of data mining results. - Once the
patterns are discovered it needs to be expressed in high level
languages, visual representations. These representations should be
easily understandable by the users.
Handling noisy or incomplete data. - The data cleaning methods are
required that can handle the noise, incomplete objects while mining
the data regularities. If data cleaning methods are not there, then the
accuracy of the discovered patterns will be poor.
Pattern evaluation. - It refers to interestingness of the problem. The
patterns discovered should be interesting because either they represent
common knowledge or lack novelty.
Efficiency and scalability of data mining algorithms. - In order to
effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.
Parallel, distributed, and incremental mining algorithms. - The
factors such as huge size of databases, wide distribution of data, and
complexity of data mining methods motivate the development of
parallel and distributed data mining algorithms. These algorithms
divide the data into partitions which is further processed parallel.
Then the results from the partitions is merged. The incremental
algorithms, updates databases without having mine the data again
from scratch.
Prepared by Dr. Dunia H. Hameed Page 37
Data Warehouse and Data Mining 2023-2024
2.8 Knowledge Discovery in Databases(KDD)
Some people treat data mining same as Knowledge discovery while some
people view data mining essential step in process of knowledge discovery.
Here is the list of steps involved in knowledge discovery process:
Data Cleaning - In this step the noise and inconsistent data is
removed.
Data Integration - In this step multiple data sources are combined.
Data Selection - In this step relevant to the analysis task are retrieved
from the database.
Data Transformation - In this step data are transformed or
consolidated into forms appropriate for mining by performing
summary or aggregation operations.
Data Mining - In this step intelligent methods are applied in order to
extract data patterns.
Pattern Evaluation - In this step, data patterns are evaluated.
Knowledge Presentation - In this step, knowledge is represented.
Prepared by Dr. Dunia H. Hameed Page 38
Data Warehouse and Data Mining 2023-2024
The following diagram shows the process of knowledge discovery process:
Architecture of KDD
Prepared by Dr. Dunia H. Hameed Page 39