0% found this document useful (0 votes)

295 views16 pages

Ch-4 Data Mining Knowledge Representation Premitives

Data mining involves extracting knowledge from large amounts of data. A data mining query specifies the task, including the relevant data, type of knowledge to be mined, background knowledge, and interestingness measures. The relevant data can be selected from databases or data warehouses using conditions. Background knowledge includes concept hierarchies that allow discovering patterns at different levels of abstraction. Interestingness measures estimate the simplicity, certainty, utility, and novelty of patterns to filter uninteresting ones. Discovered patterns are presented using various visualizations like rules, tables, charts, and trees.

Uploaded by

Satyam Shaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

295 views16 pages

Ch-4 Data Mining Knowledge Representation Premitives

Uploaded by

Satyam Shaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Ch-4: DATA MINING

PRIMITIVES
• Data Mining:

Data Miningrefers to extracting on mining

knowledge from large amount of data.
• Data Mining Primitives:

A data mining task can be specified in the form of a data

mining query which is input to the data mining system
• A mining query is defined in terms of the following

 Task-Relevant Data

 The Kind Of Knowledge to be Mined

 Background Knowledge : Concept Hierarchies

 Interestingness Measures

 Presentation and Visualization of Discovered Pattern

TASK-RELEVANT DATA

• The set of task relevant data can be collected a relational query(SQL

and DMQL) involving operation like selection , projection , join
and aggregation.
• The data collection process results in a new data relation called the
initial data relation.
• The initial relation may or may not correspond to a physical relation
in the database.
• Virtual relation are called views in the field of databases, the set of
task-relevant data for data mining is called a minable view.
• The task-relevant data can be specified by providing the following
information:
 The names of the database or data warehouse to be used

 The names of the tables or data cubes containing the

relevant
data
 Condition for selection the relevant data

 The relevant attributes or dimensions

 The data retrieved be grouped by certain attributes ,

such as
“grouped by data”
• The set of task relevant data can be specified by condition based

data filtering ,slicing or dicing of the data cube

• For eg : A concept hierarchy on item that specifies that “home

entertainment ” is at a higher concept level , composed of the lower

concept level {“TV”,”CD player ”, ” VCR”} can be used in the

collection of the task-relevant data.

THE KIND OF KNOWLEDGE TO BE MINED

• The kinds of knowledge include concept description

(characterization , discrimination ), association , classification ,
prediction , clustering , and evolution analysis.
• These templates or metapatterns can be used to guide the discovery
process.
• For eg :
age(X ,”30…39”) ^ income (X,”40K…49K”) =>buys (X,”VCR”)

[2.2%,60%]
BACKGROUND KNOWLEDGE : CONCEPT
HIERARCHIES

• Background knowledge is information about the domain to be

mined that can be useful in the discovery process.
• Background knowledge known as concept hierarchies. concept
hierarchies allows the discovery of knowledge at multiple levels of
abstraction.
• concept hierarchies defines a sequence of mappings from a set of
low-level concept to higher-level .
Concept hierarchy
• concept hierarchies is represented as a set of nodes organized in a
tree , where each node , in itself , represents a concept.
• There are four types of concept hierarchies :

 Schema hierarchies
 Set grouping hierarchies

 Operation-derived hierarchies

 Rule –based hierarchies.

• Schema hierarchies : is a total or partial order among attributes in the
database schema.
street < city < state < country
• Set grouping hierarchies : organizes a values for a given attribute or
dimension into groups of constants or range values.
{young , middle-age) C all (age)
{20…39} C young
{40…59} C middle-aged
• Operation-derived hierarchies : include the decoding of
information-
encoded string , information extraction from complex data objects.
login-name < department < university < country forming a email
address.
• Rule –based hierarchies : set of rules and is evaluated dynamically based
on the current database data and the rule definition.
low_profit_margin(X) <= price( X,P1) ^ cost (X,P2) ^ (( P1-P2)
<
$50)
INTERESTINGNESS MEASURES
• The number of uninteresting patterns returned by the process. This can
be achieved by specifying interestingness measure that estimate the
 simplicity,

 certainty ,

 utility and

 novelty.

• Each measure is associated with a threshold that can be controlled by the

user.
• SIMPLICITY:

Simplicity can be viewed as functions of the pattern

structure defined in terms of the pattern size in bits or the number of
attributes or operators appearing in the pattern. for eg: rule length.
• CERTAINTY:

Each discovery pattern should have a measure of certainty

associated with it that assesses the validity or trustworthiness of the
pattern. A certainty measure for associated rules of the form
“A=>B”, where A and B are set of items, is confidence.

confidence(A=>B)=
#_tuples_containing_both_A_and_B

#_tuples_containing_A
• UTILITY:
It can be estimated by a utility function such as support. The
support of an associated pattern refers to the percentage of task-relevant
data tuples for which the pattern is true .for associated rules of the form
“A=>B” where A and B are set of items,
support(A=>B) = #_tuples_containing_both_A_and_B
total_#_of_tuples
• NOVELTY:
It contribute new information or increased performed to the given
pattern set. Novelty is removed redundant patterns. For eg: a data
exception may be considered novel in it differs from that based on
statistical model or user beliefs.
location(X,”CANADA”) => buys( X,”SONY_TV”) [8%, 70%]
PRESENTATION AND VISUALIZATION OF
DISCOVERED PATTERNS

• Data mining system should be able to display the discovery patterns

in multiple patterns such as rules, tables, crosstabs, pie charts,
decision tree, cubes, or other visual representations .
• Data mining system should employ concept hierarchies to
implement drill-down and roll-up operation. So that users may
discovery patterns at multiple levels of abstraction.
• In addition pivoting, slicing and dicing operation ,the user in
viewing generalized data and knowledge from different perspective.
Various form of presenting and visualizing the
discovered pattern

Chapter 10 - Database System Development Lifecycle
No ratings yet
Chapter 10 - Database System Development Lifecycle
25 pages
Three Level Architecture
No ratings yet
Three Level Architecture
24 pages
Data Preprocessing in Data Warehousing
100% (1)
Data Preprocessing in Data Warehousing
7 pages
CT004 3 3 ADVBS Advance Database Systems Final Exam
No ratings yet
CT004 3 3 ADVBS Advance Database Systems Final Exam
3 pages
System Admin Lab Manual Guide
No ratings yet
System Admin Lab Manual Guide
46 pages
Python Loop and Function Exercises
No ratings yet
Python Loop and Function Exercises
12 pages
MIPS ISA Quiz: Addressing Modes & Pipelines
No ratings yet
MIPS ISA Quiz: Addressing Modes & Pipelines
15 pages
Components of The Data Processing
No ratings yet
Components of The Data Processing
4 pages
Lesson 1 - Introduction To Big Data and Hadoop
No ratings yet
Lesson 1 - Introduction To Big Data and Hadoop
46 pages
Data Warehousing and Data Mining Syllabus
0% (1)
Data Warehousing and Data Mining Syllabus
2 pages
MS-DOS Commands Overview
No ratings yet
MS-DOS Commands Overview
4 pages
Chapter 3 Database Systems and Big Data
No ratings yet
Chapter 3 Database Systems and Big Data
39 pages
Co Question Bank
No ratings yet
Co Question Bank
6 pages
Database Management System L6 MARCH MOCK
No ratings yet
Database Management System L6 MARCH MOCK
5 pages
DBMS Multiple Choice Questions and Answers-Normalization66
No ratings yet
DBMS Multiple Choice Questions and Answers-Normalization66
4 pages
Python Question Bank
100% (2)
Python Question Bank
3 pages
Database Management System Assignment
No ratings yet
Database Management System Assignment
8 pages
DBMS Final Term Paper
No ratings yet
DBMS Final Term Paper
13 pages
Bitmap Indexing in Data Warehousing
No ratings yet
Bitmap Indexing in Data Warehousing
24 pages
Operator Overloading Practice Questions
No ratings yet
Operator Overloading Practice Questions
3 pages
IT1 2017 Database Management System
100% (1)
IT1 2017 Database Management System
6 pages
Chapter 4-Data Representation in Computers
No ratings yet
Chapter 4-Data Representation in Computers
8 pages
PPS Unit-1
No ratings yet
PPS Unit-1
59 pages
Assignment-Distributed Database System
20% (5)
Assignment-Distributed Database System
6 pages
Database Operations in Bankview Lab
No ratings yet
Database Operations in Bankview Lab
18 pages
Multidimensional Database Schemas
No ratings yet
Multidimensional Database Schemas
5 pages
Chapter 4. Enterprise Technologies and Big Data Business
No ratings yet
Chapter 4. Enterprise Technologies and Big Data Business
37 pages
Database Management MCQs Lecture 4
No ratings yet
Database Management MCQs Lecture 4
11 pages
Database Engineering (EC-240) : Lab Manual # 04
No ratings yet
Database Engineering (EC-240) : Lab Manual # 04
9 pages
SQL Queries for University Schema
50% (2)
SQL Queries for University Schema
3 pages
Instruction Set Architecture Types
No ratings yet
Instruction Set Architecture Types
16 pages
3 - Disk Performance Parameter
No ratings yet
3 - Disk Performance Parameter
5 pages
Module 1
No ratings yet
Module 1
25 pages
Describe in Detail Shared Memory Multiprocessor Models
No ratings yet
Describe in Detail Shared Memory Multiprocessor Models
3 pages
Chapter - 2 Machine Learning Overview
No ratings yet
Chapter - 2 Machine Learning Overview
90 pages
Database Systems for IT Students
75% (4)
Database Systems for IT Students
63 pages
Security Policy Guidelines
No ratings yet
Security Policy Guidelines
4 pages
Formal Language Theory Assignment
No ratings yet
Formal Language Theory Assignment
8 pages
Pgdca DBMS Practical
No ratings yet
Pgdca DBMS Practical
2 pages
Relational Set Operators Lecture 5
No ratings yet
Relational Set Operators Lecture 5
25 pages
6.function Oriented Software Design and DFD
No ratings yet
6.function Oriented Software Design and DFD
60 pages
HPC QB With Answer
No ratings yet
HPC QB With Answer
17 pages
Group Assignment: Technology Park Malaysia CT042-3-1-IDB Introduction To Database NP1F1609IT
No ratings yet
Group Assignment: Technology Park Malaysia CT042-3-1-IDB Introduction To Database NP1F1609IT
38 pages
Merge Sort Mcqs Final
No ratings yet
Merge Sort Mcqs Final
5 pages
Ict Hardware Question Paper (1st Sem-2016) PDF
67% (3)
Ict Hardware Question Paper (1st Sem-2016) PDF
5 pages
11-Multi Level Queue Scheduling
No ratings yet
11-Multi Level Queue Scheduling
16 pages
PHP Lab - Iv Sem - Bca
No ratings yet
PHP Lab - Iv Sem - Bca
16 pages
C Programming Functions and Concepts Quiz
No ratings yet
C Programming Functions and Concepts Quiz
19 pages
Digital Computer Fundamentals
No ratings yet
Digital Computer Fundamentals
37 pages
Computer Architecture Class Notes
No ratings yet
Computer Architecture Class Notes
0 pages
Exercise Transaction Management System
100% (1)
Exercise Transaction Management System
2 pages
Manage Workplace Information EMPM 01
100% (1)
Manage Workplace Information EMPM 01
2 pages
File Systems and Databases: Database Systems: Design, Implementation, and Management, 4th Edition
No ratings yet
File Systems and Databases: Database Systems: Design, Implementation, and Management, 4th Edition
50 pages
Lecture 2.1.9 Comparison of BNN and ANN
No ratings yet
Lecture 2.1.9 Comparison of BNN and ANN
5 pages
Data Mining System Architecture Overview
No ratings yet
Data Mining System Architecture Overview
26 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
64 pages
Data Mining Primitives, Languages and System Architecture
No ratings yet
Data Mining Primitives, Languages and System Architecture
64 pages
Data Mining Primitives and Architecture
No ratings yet
Data Mining Primitives and Architecture
39 pages
U1 - Data Mining Task Primitives
No ratings yet
U1 - Data Mining Task Primitives
4 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
Netbackup 8.0 Blueprint Catalog
No ratings yet
Netbackup 8.0 Blueprint Catalog
58 pages
Interview Questions For SAP Basis
No ratings yet
Interview Questions For SAP Basis
9 pages
Request for Discount on Proforma Invoice
No ratings yet
Request for Discount on Proforma Invoice
5 pages
SAP IDES ERP 6.0 EHP4 Install Guide
No ratings yet
SAP IDES ERP 6.0 EHP4 Install Guide
8 pages
How To Build An IT Architecture Team
No ratings yet
How To Build An IT Architecture Team
10 pages
Tushar Resume (B.Tech Cse) - 1
No ratings yet
Tushar Resume (B.Tech Cse) - 1
1 page
Mastering Apache Spark Techniques
No ratings yet
Mastering Apache Spark Techniques
24 pages
Dorking
No ratings yet
Dorking
2 pages
CSC441 Compiler Lab Manual
No ratings yet
CSC441 Compiler Lab Manual
146 pages
Rakesh XXXX: Network Administrator Profile
No ratings yet
Rakesh XXXX: Network Administrator Profile
2 pages
Lecture 7 - RequirementsEngineering
No ratings yet
Lecture 7 - RequirementsEngineering
17 pages
Real-Time Sign Language Recognition
No ratings yet
Real-Time Sign Language Recognition
10 pages
XSS Testing Tools
No ratings yet
XSS Testing Tools
1 page
Informatics Session 1
No ratings yet
Informatics Session 1
11 pages
M5 Glossary
No ratings yet
M5 Glossary
2 pages
Ethical Hacking XSS BeeF
No ratings yet
Ethical Hacking XSS BeeF
16 pages
IRM4720 Assignment 2 Questions
No ratings yet
IRM4720 Assignment 2 Questions
6 pages
Critical Thinking Challenge 1 PDF
0% (1)
Critical Thinking Challenge 1 PDF
3 pages
Marshal Copy of Assignment 3 Practicals
No ratings yet
Marshal Copy of Assignment 3 Practicals
4 pages
Ethical Hacking Seminar Guide
No ratings yet
Ethical Hacking Seminar Guide
26 pages
Web Development - Lec1 - Eng - Ebtisam ALselwi
No ratings yet
Web Development - Lec1 - Eng - Ebtisam ALselwi
46 pages
Understanding Computers
No ratings yet
Understanding Computers
22 pages
# Sample Questions of Computer Applications - All For BCom, MCom, BBA
No ratings yet
# Sample Questions of Computer Applications - All For BCom, MCom, BBA
18 pages
The Next Generation of Integrated Companies Registry Information System
No ratings yet
The Next Generation of Integrated Companies Registry Information System
44 pages
Online Collaborative Tools Quiz
No ratings yet
Online Collaborative Tools Quiz
4 pages
ASUG84339 - How To Secure Privacy Data in A Hybrid S4HANA Landscape
No ratings yet
ASUG84339 - How To Secure Privacy Data in A Hybrid S4HANA Landscape
26 pages
Fortinet Cybersecurity Training Overview
No ratings yet
Fortinet Cybersecurity Training Overview
20 pages
Oracle - Testinises.1z0 064.simulations.v2019 May 24.by - Keith.26q.vce
No ratings yet
Oracle - Testinises.1z0 064.simulations.v2019 May 24.by - Keith.26q.vce
15 pages
Caxis Web: Tailored IT Solutions
No ratings yet
Caxis Web: Tailored IT Solutions
16 pages
SQL Concepts and Anomalies in Databases
No ratings yet
SQL Concepts and Anomalies in Databases
14 pages

Ch-4 Data Mining Knowledge Representation Premitives

Uploaded by

Ch-4 Data Mining Knowledge Representation Premitives

Uploaded by

Ch-4: DATA MINING

Data Miningrefers to extracting on mining

A data mining task can be specified in the form of a data

 The Kind Of Knowledge to be Mined

 Background Knowledge : Concept Hierarchies

 Presentation and Visualization of Discovered Pattern

• The set of task relevant data can be collected a relational query(SQL

 The names of the tables or data cubes containing the

 The relevant attributes or dimensions

 The data retrieved be grouped by certain attributes ,

data filtering ,slicing or dicing of the data cube

• For eg : A concept hierarchy on item that specifies that “home

entertainment ” is at a higher concept level , composed of the lower

concept level {“TV”,”CD player ”, ” VCR”} can be used in the

collection of the task-relevant data.

• The kinds of knowledge include concept description

• Background knowledge is information about the domain to be

 Rule –based hierarchies.

• Each measure is associated with a threshold that can be controlled by the

Simplicity can be viewed as functions of the pattern

Each discovery pattern should have a measure of certainty

• Data mining system should be able to display the discovery patterns

You might also like