0% found this document useful (0 votes)

31 views33 pages

DWDM Module III

Uploaded by

Reddy Sindhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views33 pages

DWDM Module III

Uploaded by

Reddy Sindhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

ASSOCIATION RULE MINING

Frequent patterns are patterns (such as itemsets, subsequences, or substructures) that

appear in a data set frequently.

For example, a set of items, such as milk and bread, that appear frequently together in a
transaction data set is a frequent itemset.
A subsequence, such as buying first a PC, then a digital camera, and then a memory card,
if it occurs frequently in a shopping history database, is a (frequent) sequential pattern.

A substructure can refer to different structural forms, such as subgraphs, subtrees, or

sublattices, which may be combined with itemsets or subsequences. If a substructure
occurs frequently, it is called a (frequent) structured pattern.

Finding such frequent patterns plays an essential role in mining associations, correlations,
and many other interesting relationships among data.
ASSOCIATION RULE MINING

How can we find frequent itemsets from large amounts of data, where the data are either
transactional or relational?

How can we mine association rules in multilevel and multidimensional space?

Which association rules are the most interesting?

How can we help or guide the mining procedure to discover interesting associations or
correlations?

How can we take advantage of user preferences or constraints to speed up the mining
process?
Market Basket Analysis:

Market Basket Analysis: A Motivating Example

A typical example of frequent itemset mining is market basket analysis. This process
analyzes customer buying habits by finding associations between the different items that
customers place in their “shopping baskets”.

The discovery of such associations can help retailers develop marketing strategies by
gaining insight into which items are frequently purchased together by customers.

For instance, if customers are buying milk, how likely are they to also buy bread (and
what kind of bread) on the same trip to the supermarket?

Such information can lead to increased sales by helping retailers do selective marketing
and plan their shelf space.
Market Basket Analysis:
Market Basket Analysis:

If we think of the universe as the set of items available at the store, then each item has a
boolean variable representing the presence or absence of that item.
Each basket can then be represented by a Boolean vector of values assigned to these
variables.

The Boolean vectors can be analyzed for buying patterns that reflect items that are
frequently associated or purchased together.

These patterns can be represented in the form of association rules. For example, the
information that customers who purchase computers also tend to buy antivirus software at
the same time is represented in Association Rule
computer->antivirus software [support = 2%; confidence = 60%]
Support and Confidence measures

Support and Confidence measures

Rule support and confidence are two measures of rule interestingness. They respectively
reflect the usefulness and certainty of discovered rules.

A support of 2% for Association Rule means that 2% of all the transactions under analysis
show that computer and antivirus software are purchased together.

A confidence of 60% means that 60% of the customers who purchased a computer also
bought the software.

Typically, association rules are considered interesting if they satisfy both a minimum
support threshold and a minimum confidence threshold. Such thresholds can be set by
users or domain experts.
Frequent Itemsets, Closed Itemsets, and Association Rules
Frequent Itemsets:
Let I ={I1, I2, , .. , Im} be a set of items. Let D, the task-relevant data, be a set of database
transactions where each transaction T is a set of items such that T ⊆ I . Each transaction is
associated with an identifier, called TID. Let A be a set of items. A transaction T is said to
contain A if and only if A ⊆ T. An association rule is an implication of the form A ⇒B, where
A⊆I , B⊆I , and A∩B=∅.

The rule A⇒B holds in the transaction set D with support s, where s is the percentage of
transactions in D that contain AUB (i.e., the union of sets A and B, or say, both A and B).
This is taken to be the probability, P(AUB).
support(A⇒B) =P(A ∪B)

The rule A⇒B has confidence c in the transaction set D, where c is the percentage of
transactions in D containing A that also contain B. This is taken to be the conditional
probability, P(B/A). That is,
confidence(A⇒B) =P(B|A)
Frequent Itemsets, Closed Itemsets, and Association Rules

A set of items is referred to as an itemset.

An itemset that contains k items is a k-itemset. The set {computer, antivirus software} is a
2-itemset. The occurrence frequency of an itemset is the number of transactions that
contain the itemset. This is also known, simply, as the frequency, support count, or count
of the itemset.

confidence(A⇒B) = P(B|A) = support(A ∪B)/ support(A) = support count(A ∪B)/ support count(A)
Frequent Itemsets, Closed Itemsets, and Association Rules

Closed Itemset:

An itemset X is closed in a data set S if there exists no proper super-itemset Y such that Y
has the same support count as X in S. An itemset X is a closed frequent itemset in set S if
X is both closed and frequent in S. An itemset X is a maximal frequent itemset (or max-
itemset) in set S if X is frequent, and there exists no super-itemset Y such that X ⊆ Y and
Y is frequent in S.
Frequent Itemsets, Closed Itemsets, and Association Rules

Association Rules

In general, association rule mining can be viewed as a two-step process:

1. Find all frequent itemsets:

By definition, each of these itemsets will occur at least as frequently as a predetermined

minimum support count, min sup.

2. Generate strong association rules from the frequent itemsets:

By definition, these rules must satisfy minimum support and minimum confidence.
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE

FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE

Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for mining
frequent itemsets for Boolean association rules. The name of the algorithm is based on the
fact that the algorithm uses prior knowledge of frequent itemset properties.

Apriori employs an iterative approach known as a level-wise search, where k-itemsets are
used to explore (k+1)-itemsets. First, the set of frequent 1-itemsets is found by scanning
the database to accumulate the count for each item, and collecting those items that satisfy
minimum support. The resulting set is denoted L1.

Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and
so on, until no more frequent k-itemsets can be found. The finding of each Lk requires
one full scan of the database.
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE

To improve the efficiency of the level-wise generation of frequent itemsets, an important

property called the Apriori property

Apriori property: All nonempty subsets of a frequent itemset must also be frequent.

The Apriori property is based on the following observation. By definition, if an itemset I

does not satisfy the minimum support threshold, min sup, then I is not frequent; that is,
P(I) < min sup. If an item A is added to the itemset I, then the resulting itemset (i.e.,
I U A) cannot occur more frequently than I. Therefore, I U A is not frequent either; that
is, P(I U A) < min sup.

This property belongs to a special category of properties called antimonotone in the sense
that if a set cannot pass a test, all of its supersets will fail the same test as well. It is called
antimonotone because the property is monotonic in the context of failing a test.
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE

Apriori. Let’s look at a concrete example, based on the AllElectronics transaction database,D, of Table. There
are nine transactions in this database, that is, |D| = 9.
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE

Generating Association Rules from Frequent Itemsets

Once the frequent itemsets from transactions in a database D have been found,it is
straightforward to generate strong association rules from them (where strong association
rules satisfy both minimum support and minimum confidence).

confidence(A⇒B) = P(B|A) = support count(A ∪B)/ support count(A)

The conditional probability is expressed in terms of itemset support count, where support
count(AUB) is the number of transactions containing the itemsets AUB, and support
count(A) is the number of transactions containing the itemset A. Based on this equation,
association rules can be generated as follows:
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE
FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE
FREQUENT ITEM SET GENERATION – FP Growth Algorithm
FREQUENT ITEM SET GENERATION – FP Growth Algorithm
FREQUENT ITEM SET GENERATION – FP Growth Algorithm

Mining Frequent Itemsets without Candidate Generation

FREQUENT ITEM SET GENERATION – FP Growth Algorithm

The first scan of the database is the same as Apriori, which derives the set of frequent
items (1-itemsets) and their support counts (frequencies). Let the minimum support count
be 2. The set of frequent items is sorted in the order of descending support count. This
resulting set or list is denoted L. Thus, we have L ={{I2: 7}, {I1: 6}, {I3: 6},
{I4: 2}, {I5: 2}}.
An FP-tree is then constructed as follows. First, create the root of the tree, labeled with
“null.” Scan database D a second time. The items in each transaction are processed in L
order (i.e., sorted according to descending support count), and a branch is created for each
transaction. For example, the scan of the first transaction, “T100: I1, I2, I5,” which
contains three items (I2, I1, I5 in L order), leads to the construction of the first branch of
the tree with three nodes, <I2: 1>, <I1:1>, and <I5: 1>, where I2 is linked as a child of the
root, I1 is linked to I2, and I5 is linked to I1.
FREQUENT ITEM SET GENERATION – FP Growth Algorithm

The second transaction, T200, contains the items I2 and I4 in L order, which would result
in a branch where I2 is linked to the root and I4 is linked to I2. However, this branch
would share a common prefix, I2, with the existing path for T100.

Therefore, we instead increment the count of the I2 node by 1, and create a new node,
<I4: 1>,which is linked as a child of <I2: 2>. In general, when considering the branch to
be added for a transaction, the count of each node along a common prefix is incremented
by 1, and nodes for the items following the prefix are created and linked accordingly. To
facilitate tree traversal, an item header table is built so that each item points to its
occurrences in the tree via a chain of node-links. The tree obtained after scanning all of
the transactions is shown in Figure with the associated node-links. In this way, the
problem of mining frequent patterns in databases is transformed to that of mining the FP-
tree.
FREQUENT ITEM SET GENERATION – FP Growth Algorithm
FREQUENT ITEM SET GENERATION – FP Growth Algorithm

The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an
initial suffix pattern), construct its conditional pattern base (a “sub
database,”which consists of the set of prefix paths in the FP-tree co-occurring with
the suffix pattern), then construct its (conditional) FP-tree, and perform mining
recursively on such a tree. The pattern growth is achieved by the concatenation of
the suffix pattern with the frequent patterns generated from a conditional FP-tree.
FREQUENT ITEM SET GENERATION – FP Growth Algorithm

Mining of the FP-tree is summarized in Table and detailed as follows. We first consider
I5, which is the last item in L, rather than the first. The reason for starting at the end of the
list will become apparent as we explain the FP-tree mining process. I5 occurs in two
branches of the FP-tree of Figure. (The occurrences of I5 can easily be found by
following its chain of node-links.) The paths formed by these branches are <I2, I1,I5: 1>
and <I2, I1, I3, I5: 1>. Therefore, considering I5 as a suffix, its corresponding two prefix
paths are <I2, I1: 1> and <I2, I1, I3: 1>, which form its conditional pattern base. Its
conditional FP-tree contains only a single path,<I2: 2, I1: 2>; I3 is not included because
its support count of 1 is less than the minimum support count. The single path generates
all the combinations of frequent patterns: {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}. For I4, its
two prefix paths form the conditional pattern base, {{I2 I1: 1}, {I2: 1}},which generates a
single-node conditional FP-tree, <I2: 2>, and derives one frequent
FREQUENT ITEM SET GENERATION – FP Growth Algorithm

pattern, {I2, I1: 2}. Notice that although I5 follows I4 in the first branch, there is no need
to include I5 in the analysis here because any frequent pattern involving I5 is analyzed in
the examination of I5.
Similar to the above analysis, I3’s conditional pattern base is {{I2, I1: 2}, {I2: 2},{I1:
2}}. Its conditional FP-tree has two branches, <I2: 4, I1: 2> and <I1: 2>, as shown in
Figure which generates the set of patterns, {{I2, I3: 4}, {I1, I3: 4}, {I2, I1, I3: 2}}.
Finally, I1’s conditional pattern base is {{I2: 4}}, whose FP-tree contains only one node,
<I2: 4>, which generates one frequent pattern, {I2, I1: 4}. This mining process is
summarized in Figure.
FREQUENT ITEM SET GENERATION – FP Growth Algorithm
Compact frequent itemsets-Maximal frequent itemset :

Compact frequent itemsets-Maximal frequent itemset :

The number of frequent itemsets generated by the Apriori algorithm can often be very
large, so it is beneficial to identify a small representative set from which every frequent
itemset can be derived. One such approach is using maximal frequent itemsets.

A maximal frequent itemset is a frequent itemset for which none of its immediate
supersets are frequent. To illustrate this concept, consider the example given below:
Compact frequent itemsets-Maximal frequent itemset :
Compact frequent itemsets-Maximal frequent itemset :

The support counts are shown on the top left of each node. Assume support count
threshold = 50%, that is, each item must occur in 2 or more transactions. Based on that
threshold, the frequent itemsets are: a, b, c, d, ab, ac and ad (shaded nodes).

Out of these 7 frequent itemsets, 3 are identified as maximal frequent (having red
outline):

ab: Immediate supersets abc and abd are infrequent.

ac: Immediate supersets abc and acd are infrequent.
ad: Immediate supersets abd and acd are infrequent.

The remaining 4 frequent nodes (a, b, c and d) cannot be maximal frequent because they
all have at least 1 immediate superset that is frequent.
Compact frequent itemsets-Closed frequent itemset :

Closed frequent itemset:

An itemset is closed if none of its immediate superset has the same support as that of the
itemset.

Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Module 2
No ratings yet
Module 2
14 pages
Unit 4
No ratings yet
Unit 4
97 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Market Basket Analysis with Association Rules
No ratings yet
Market Basket Analysis with Association Rules
54 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rules & Clustering Techniques
No ratings yet
Association Rules & Clustering Techniques
13 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
Apriori Documentation
No ratings yet
Apriori Documentation
31 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
Mining Frequent Patterns Ubnit 3
No ratings yet
Mining Frequent Patterns Ubnit 3
25 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
Unit 2
No ratings yet
Unit 2
14 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Contents
No ratings yet
Contents
59 pages
Frequent Itemsets & Market-Basket Analysis
No ratings yet
Frequent Itemsets & Market-Basket Analysis
31 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
Module 2
No ratings yet
Module 2
13 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Unit 5 DWDM - 2
No ratings yet
Unit 5 DWDM - 2
50 pages
Mod 5
No ratings yet
Mod 5
56 pages
Frequent Pattern Mining in Retail Analysis
No ratings yet
Frequent Pattern Mining in Retail Analysis
18 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
667a8d24bb947 PPT
No ratings yet
667a8d24bb947 PPT
24 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Analysis in Data Mining
No ratings yet
Association Analysis in Data Mining
72 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
CBSE Class 10 Chapter 6 Digital Documentation Important Notes Session 1 Applying Styles in The Documents
No ratings yet
CBSE Class 10 Chapter 6 Digital Documentation Important Notes Session 1 Applying Styles in The Documents
10 pages
AI Robotics Logistics Full Script
No ratings yet
AI Robotics Logistics Full Script
3 pages
Mohib Scheme of Study
No ratings yet
Mohib Scheme of Study
4 pages
Fundamentals of Epidemiology Sullabus 2023
No ratings yet
Fundamentals of Epidemiology Sullabus 2023
7 pages
Cam Attendance Scaner PDF
No ratings yet
Cam Attendance Scaner PDF
12 pages
Alchemist by Ben Johnson
No ratings yet
Alchemist by Ben Johnson
1 page
The Poisson Distribution
No ratings yet
The Poisson Distribution
20 pages
The Biofield: A Different Type of Magnetism
No ratings yet
The Biofield: A Different Type of Magnetism
6 pages
MS Word 2016 Software Skills Guide
No ratings yet
MS Word 2016 Software Skills Guide
23 pages
Nursing Case Analysis Guide
No ratings yet
Nursing Case Analysis Guide
7 pages
Chapter X - Preactivity
No ratings yet
Chapter X - Preactivity
2 pages
ETO Syllabus
60% (5)
ETO Syllabus
5 pages
1c Elster Kent Optima 100 Water Meter Brochure PDF
No ratings yet
1c Elster Kent Optima 100 Water Meter Brochure PDF
4 pages
Lab M3: The Physical Pendulum: pivot c.m. L θ r
No ratings yet
Lab M3: The Physical Pendulum: pivot c.m. L θ r
7 pages
CH 8 Concepts of Cost - D30ca2ba 7035 43d9 Aa15 Eb7fda00ab27
No ratings yet
CH 8 Concepts of Cost - D30ca2ba 7035 43d9 Aa15 Eb7fda00ab27
44 pages
Cive1219 1630 At2
0% (1)
Cive1219 1630 At2
6 pages
G11 Reading Comprehension 1
No ratings yet
G11 Reading Comprehension 1
3 pages
Army Canteen Price List 2020
67% (3)
Army Canteen Price List 2020
70 pages
Grade 9-lc2021
No ratings yet
Grade 9-lc2021
1 page
Gift Card en Guide
No ratings yet
Gift Card en Guide
13 pages
Unit Iv Recovery
No ratings yet
Unit Iv Recovery
27 pages
Compiler Design Notes Unit-1 & Unit-2
No ratings yet
Compiler Design Notes Unit-1 & Unit-2
59 pages
AALBORG Electronic Thermostat VESTA EH
No ratings yet
AALBORG Electronic Thermostat VESTA EH
2 pages
LGU DVAS Readiness Checklist
No ratings yet
LGU DVAS Readiness Checklist
3 pages
Equipment Issue Procedure
100% (1)
Equipment Issue Procedure
2 pages
Factors Behind Students' Reluctance to Speak English
100% (1)
Factors Behind Students' Reluctance to Speak English
8 pages
Lesson Plan Noun Clause 2
100% (1)
Lesson Plan Noun Clause 2
4 pages
Human Resource Management 15th Edition by Gary Dessler Download
No ratings yet
Human Resource Management 15th Edition by Gary Dessler Download
53 pages
ACKO Digital Insurance Overview and Insights
No ratings yet
ACKO Digital Insurance Overview and Insights
18 pages
Detailed Course Outline For Term - V - 2016-17
0% (1)
Detailed Course Outline For Term - V - 2016-17
124 pages

DWDM Module III

Uploaded by

DWDM Module III

Uploaded by

ASSOCIATION RULE MINING

Frequent patterns are patterns (such as itemsets, subsequences, or substructures) that

A substructure can refer to different structural forms, such as subgraphs, subtrees, or

How can we mine association rules in multilevel and multidimensional space?

Which association rules are the most interesting?

Market Basket Analysis: A Motivating Example

Support and Confidence measures

A set of items is referred to as an itemset.

In general, association rule mining can be viewed as a two-step process:

1. Find all frequent itemsets:

By definition, each of these itemsets will occur at least as frequently as a predetermined

2. Generate strong association rules from the frequent itemsets:

FREQUENT ITEM SET GENERATION - THE APRIORI PRINCIPLE

To improve the efficiency of the level-wise generation of frequent itemsets, an important

The Apriori property is based on the following observation. By definition, if an itemset I

Generating Association Rules from Frequent Itemsets

confidence(A⇒B) = P(B|A) = support count(A ∪B)/ support count(A)

Mining Frequent Itemsets without Candidate Generation

Compact frequent itemsets-Maximal frequent itemset :

ab: Immediate supersets abc and abd are infrequent.

Closed frequent itemset:

You might also like