0% found this document useful (0 votes)

174 views45 pages

Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth

The document discusses association rule mining algorithms. It begins by listing common frequent itemset mining algorithms like Apriori and FP-Growth. It then provides motivation for association rule mining by explaining how rules can predict item occurrences. Examples of market basket transactions and association rules are given. The rest of the document describes key concepts like frequent itemsets, association rules, support and confidence metrics. It explains the computational complexity and discusses algorithms like Apriori, FP-Growth and ECLAT that aim to efficiently generate frequent itemsets in two steps - first finding frequent items and then generating rules from those items.

Uploaded by

akshayhazari8281

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

174 views45 pages

Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth

Uploaded by

akshayhazari8281

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Association Rule Mining

Algorithms for frequent itemset mining

Apriori
ELCAT
FP-Growth
Acknowledgement
Lecture slides taken/modified from:
Course Material of CIS527, 2004, Temple University
Jiawei Han (http://www-sal.cs.uiuc.edu/~hanj/DM_Book.html)
Vipin Kumar (http://www-users.cs.umn.edu/~kumar/csci5980/index.html)

Motivation: Association Rule Mining

Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Market-Basket transactions
TID

Items

Bread, Milk

2
3
4
5

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Example of Association Rules

{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},

Implication means co-occurrence,

not causality!

Applications: Association Rule Mining

* Annual Maintenance Contract (AMC)
What the store should do to boost Maintenance
Agreement sales

Home Electronics *
What other products should the store stocks up?

Some options:
Attached mailing in direct marketing
Marketing and Sales Promotion
Supermarket shelf management

Definition: Frequent Itemset

Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper}

k-itemset
An itemset that contains k items

Support count ()
Frequency of occurrence of an itemset
E.g. ({Milk, Bread,Diaper}) = 2

Support
Fraction of transactions that contain an
itemset
E.g. s({Milk, Bread, Diaper}) = 2/5

Frequent Itemset
An itemset whose support is greater
than or equal to a minsup threshold

TID

Items

Bread, Milk

2
3
4
5

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Definition: Association Rule

Association Rule
An implication expression of the form
X Y, where X and Y are itemsets
Example:
{Milk, Diaper} {Beer}

TID

Items

Bread, Milk

2
3
4
5

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Rule Evaluation Metrics

Support (s)

Example:

{Milk, Diaper} Beer

Fraction of transactions that contain

both X and Y

Confidence (c)
Measures how often items in Y
appear in transactions that
contain X

s
c

(Milk , Diaper, Beer)

|T|

2
0.4
5

(Milk, Diaper, Beer) 2

0.67
(Milk , Diaper )
3

Association Rule Mining Task

Given a set of transactions T, the goal of
association rule mining is to find all rules having
support minsup threshold
confidence minconf threshold

Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf
thresholds
Computationally prohibitive!

Computational Complexity
Given d unique items:
Total number of itemsets = 2d
Total number of possible association rules:

d d k
R

k j
3 2 1
d 1

d k

k 1

j 1

d 1

If d=6, R = 602 rules

Mining Association Rules: Decoupling

TID

Items

Bread, Milk

2
3
4
5

Bread, Diaper, Beer, Eggs

Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Example of Rules:
{Milk,Diaper} {Beer} (s=0.4, c=0.67)
{Milk,Beer} {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} {Milk} (s=0.4, c=0.67)
{Beer} {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} {Milk,Beer} (s=0.4, c=0.5)
{Milk} {Diaper,Beer} (s=0.4, c=0.5)

Observations:
All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
Rules originating from the same itemset have identical support but
can have different confidence
Thus, we may decouple the support and confidence requirements

Rule Generation
Given a frequent itemset L, find all non-empty
subsets f L such that f L f satisfies the
minimum confidence requirement
If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D,
A BCD,
AB CD,
BD AC,

ABD C,
B ACD,
AC BD,
CD AB,

ACD B,
C ABD,
AD BC,

BCD A,
D ABC
BC AD,

If |L| = k, then there are ... candidate

association rules (ignoring L and L)
2k 2

Mining Association Rules

Two-step approach:
1. Frequent Itemset Generation
Generate all itemsets whose support minsup

2. Rule Generation
Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

Frequent itemset generation is still

computationally expensive

Frequent Itemset Generation

Brute-force approach:
Each itemset is a candidate frequent itemset
Count the support of each candidate by scanning the
database
Transactions

TID
1
2
3
4
5

Items
Bread, Milk
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

List of
Candidates

Match each transaction against every candidate

Complexity ~ O(NMw) => Expensive since M = 2d !!!

Frequent Itemset Generation Strategies

Reduce the number of candidates (M)
Complete search: M=2d
Use pruning techniques to reduce M

Reduce the number of transactions (N)

Reduce size of N as the size of itemset increases
Use a subsample of N transactions

Reduce the number of comparisons (NM)

Use efficient data structures to store the candidates or
transactions
No need to match every candidate against every
transaction

Reducing Number of Candidates: Apriori

Apriori principle:
If an itemset is frequent, then all of its subsets must also
be frequent
If an itemset is infrequent, then all its supersets must
also be infrequent.

Apriori principle holds due to the following property

of the support measure:

X , Y : ( X Y ) s( X ) s(Y )

Support of an itemset never exceeds the support of its

subsets
This is known as the anti-monotone property of support

Illustrating Apriori Principle

null

Found to be
Infrequent

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

ABCD

Pruned
supersets

ABCE

ABDE

ABCDE

ACDE

BCDE

Illustrating Apriori Principle

Item
Bread
Coke
Milk
Beer
Diaper
Eggs

Count
4
2
4
3
4
1

Items (1-itemsets)

Minimum Support = 3
If every subset is considered,
6C + 6C + 6C = 41
1
2
3
With support-based pruning,
6 + 6 + 1 = 13

Itemset
{Bread,Milk}
{Bread,Beer}
{Bread,Diaper}
{Milk,Beer}
{Milk,Diaper}
{Beer,Diaper}

Count
3
2
3
2
3
3

Pairs (2-itemsets)

(No need to generate

candidates involving Coke
or Eggs)

Triplets (3-itemsets)
Itemset
{Bread,Milk,Diaper}

Count
3

Apriori Algorithm
Method:
Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k
frequent itemsets
Prune candidate itemsets containing subsets of length k that
are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only those
that are frequent

Extensions to Apriori

Transaction reduction
A transaction which does not contain
frequent k-itemsets should be removed
from the database for further scans

Partitioning
First scan:
Subdivide the transactions of database D into n non
overlapping partitions
If the minimum support in D is min_sup, then the minimum
support for a partition is min_sup * number of transactions in
that partition
Local frequent items are determined
A local frequent item may not by a frequent item in D

Second scan:
Frequent items are determined from the local frequent items

Partitioning
First scan:
Subdivide the transactions of database D into n non overlapping
partitions
If the minimum support in D is min_sup, then the minimum support
for a partition is
min_sup * number of transactions in D /
number of transactions in that partition

Local frequent items are determined

A local frequent item may not by a frequent item in D

Second scan:
Frequent items are determined from the local frequent items

Sampling
Pick a random sample S of D
Search for local frequent items in S
Use a lower support threshold
Determine frequent items from the local frequent
items
Frequent items of D may be missed

For completeness a second scan is done

Bottlenecks of Apriori
Candidate generation can result in huge
candidate sets:
104 frequent 1-itemset will generate 107 candidate 2itemsets
To discover a frequent pattern of size 100, e.g., {a1,
a2, , a100}, one needs to generate 2100 ~ 1030
candidates.

Multiple scans of database:

Needs (n +1 ) scans, n is the length of the longest
pattern

ECLAT: Another Method for Frequent Itemset

Generation
ECLAT: for each item, store a list of transaction
ids (tids); vertical data layout
Horizontal
Data Layout
TID
1
2
3
4
5
6
7
8
9
10

Items
A,B,E
B,C,D
C,E
A,C,D
A,B,C,D
A,E
A,B
A,B,C
A,C,D
B

Vertical Data Layout

A
1
4
5
6
7
8
9

B
1
2
5
7
8
10

TID-list

C
2
3
4
8
9

D
2
4
5
9

E
1
3
6

ECLAT: Another Method for Frequent Itemset

Generation
Determine support of any k-itemset by intersecting tidlists of two of its (k-1) subsets.
A
1
4
5
6
7
8
9

B
1
2
5
7
8
10

AB
1
5
7
8

3 traversal approaches:
top-down, bottom-up and hybrid

Advantage: very fast support counting

Disadvantage: intermediate tid-lists may become too
large for memory

FP-growth: Another Method for Frequent

Itemset Generation
Use a compressed representation of the
database using an FP-tree
Once an FP-tree has been constructed, it uses a
recursive divide-and-conquer approach to mine
the frequent itemsets

FP-Tree Construction
TID
1
2
3
4
5
6
7
8
9
10

Items
{A,B}
{B,C,D}
{A,C,D,E}
{A,D,E}
{A,B,C}
{A,B,C,D}
{B,C}
{A,B,C}
{A,B,D}
{B,C,E}

null

After reading TID=1:

A:1
B:1
After reading TID=2:

null

A:1
B:1

B:1
C:1
D:1

FP-Tree Construction
TID
1
2
3
4
5
6
7
8
9
10

Items
{A,B}
{B,C,D}
{A,C,D,E}
{A,D,E}
{A,B,C}
{A,B,C,D}
{B,C}
{A,B,C}
{A,B,D}
{B,C,E}

Header table
Item
Pointer
A
B
C
D
E

Transaction
Database

null

B:3

A:7
B:5

C:1

C:3
D:1

C:3
D:1
D:1

D:1

E:1

E:1
Pointers are used to assist
frequent itemset generation

FP-growth
Build conditional pattern
base for E:
P = {(A:1,C:1,D:1),
(A:1,D:1),
(B:1,C:1)}

null
B:3

A:7
B:5

C:1

D:1

C:3

C:3
D:1

D:1

E:1
D:1

E:1

Recursively apply FPgrowth on P

FP-growth
Conditional tree for E:

Conditional Pattern base

for E:
P = {(A:1,C:1,D:1,E:1),
(A:1,D:1,E:1),
(B:1,C:1,E:1)}

null
B:1

A:2

C:1
D:1

E:1

D:1
E:1

C:1

Count for E is 3: {E} is

frequent itemset

Recursively apply FPgrowth on P

E:1

FP-growth
Conditional tree for D
within conditional tree
for E:
null

A:2

C:1
D:1

D:1

Conditional pattern base

for D within conditional
base for E:
P = {(A:1,C:1,D:1),
(A:1,D:1)}
Count for D is 2: {D,E} is
frequent itemset

Recursively apply FPgrowth on P

FP-growth
Conditional tree for C
within D within E:
null

Conditional pattern base

for C within D within E:
P = {(A:1,C:1)}
Count for C is 1: {C,D,E}
is NOT frequent itemset

A:1

C:1

FP-growth
Conditional tree for A
within D within E:
null

Count for A is 2: {A,D,E}

is frequent itemset
Next step:

A:2

Construct conditional tree

C within conditional tree
E
Continue until exploring
conditional tree for A
(which has only node A)

Benefits of the FP-tree Structure

Performance study shows
FP-growth is an order of
magnitude faster than
Apriori, and is also faster
than tree-projection
No candidate generation,
no candidate test
Use compact data structure
Eliminate repeated
database scan
Basic operation is counting
and FP-tree building

D1 FP-grow th runtime

D1 Apriori runtime

80
70

Run time(sec.)

Reasoning

100

60
50
40
30
20
10
0
0

0.5

1
1.5
2
Support threshold(%)

2.5

Complexity of Association Mining

Choice of minimum support threshold
lowering support threshold results in more frequent itemsets
this may increase number of candidates and max length of
frequent itemsets

Dimensionality (number of items) of the data set

more space is needed to store support count of each item
if number of frequent items also increases, both computation and
I/O costs may also increase

Size of database
since Apriori makes multiple passes, run time of algorithm may
increase with number of transactions

Average transaction width

transaction width increases with denser data sets
This may increase max length of frequent itemsets and traversals
of hash tree (number of subsets in a transaction increases with its
width)

Mining Frequent Patterns With

FP-trees
Idea: Frequent pattern growth
Recursively grow frequent patterns by pattern and
database partition

Method
For each frequent item, construct its conditional
pattern-base, and then its conditional FP-tree
Repeat the process on each newly created conditional
FP-tree
Until the resulting FP-tree is empty, or it contains only
one pathsingle path will generate all the
combinations of its sub-paths, each of which is a
frequent pattern

Compact Representation of Frequent

Itemsets
Some itemsets are redundant because they have
identical support as their supersets
TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
7
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
8
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
9
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
10
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
13
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
14
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
15
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1

10
Number of frequent itemsets 3
k
10

k 1

Need a compact representation

Maximal Frequent Itemset

An itemset is maximal frequent if none of its immediate supersets
is frequent
null

Maximal
Itemsets

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

ABCD

Infrequent
Itemsets

ABCE

ABDE

ABCD
E

ACDE

BCDE

Border

Closed Itemset
Problem with maximal frequent itemsets:
Support of their subsets is not known additional DB scans are
needed

An itemset is closed if none of its immediate supersets

has the same support as the itemset
TID
1
2
3
4
5

Items
{A,B}
{B,C,D}
{A,B,C,D}
{A,B,D}
{A,B,C,D}

Itemset
{A}
{B}
{C}
{D}
{A,B}
{A,C}
{A,D}
{B,C}
{B,D}
{C,D}

Support
4
5
3
4
4
2
3
3
4
3

Itemset
{A,B,C}
{A,B,D}
{A,C,D}
{B,C,D}
{A,B,C,D}

Support
2
3
2
2
2

Maximal vs Closed Frequent Itemsets

Minimum support = 2
124

123

124

12
ABC

TID

Items

ABC

ABCD

BCE

ACDE

ABD

ABE

1234

345
D

2
BC

4
ACD

245

123

Closed but
not maximal

null

4
ACE

ADE

BCD

BCE

ABCE

ABDE

ABCDE

BDE

CDE

# Closed = 9

4
ABCD

Closed and
maximal

ACDE

BCDE

# Maximal = 4

Maximal vs Closed Itemsets

Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets

Presentation of Association Rules (Table Form)

Extra

Rule Generation
How to efficiently generate rules from frequent
itemsets?
In general, confidence does not have an antimonotone property
c(ABC D) can be larger or smaller than c(AB D)

But confidence of rules generated from the same

itemset has an anti-monotone property
e.g., L = {A,B,C,D}:

c(ABC D) c(AB CD) c(A BCD)

Confidence is anti-monotone w.r.t. number of items on the
RHS of the rule

Visualization of Association Rule Using Plane Graph

Visualization of Association Rule Using Rule Graph

Apriori
No ratings yet
Apriori
34 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Market Basket Analysis in Data Mining
No ratings yet
Market Basket Analysis in Data Mining
75 pages
Optimal Hybrid Classifiers for IDS
No ratings yet
Optimal Hybrid Classifiers for IDS
12 pages
Polynomial Regression Guide
No ratings yet
Polynomial Regression Guide
14 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Retail Data Insights & Strategies
No ratings yet
Retail Data Insights & Strategies
24 pages
DM Case Studies
No ratings yet
DM Case Studies
24 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
9 pages
Machine Learning in Mechanical Engineering
No ratings yet
Machine Learning in Mechanical Engineering
20 pages
Data Science Hack Night Guide
No ratings yet
Data Science Hack Night Guide
3 pages
Basic Decision Tree (Student)
100% (1)
Basic Decision Tree (Student)
51 pages
Lazy Learning (Or Learning From Your Neighbors)
No ratings yet
Lazy Learning (Or Learning From Your Neighbors)
3 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
K-Nearest Neighbor Algorithm
100% (1)
K-Nearest Neighbor Algorithm
6 pages
Unit 5
No ratings yet
Unit 5
5 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
Data Mining Concepts and Techniques
100% (1)
Data Mining Concepts and Techniques
63 pages
KNN for Telecom Customer Segmentation
100% (1)
KNN for Telecom Customer Segmentation
11 pages
Data Mart Info
No ratings yet
Data Mart Info
5 pages
Arima
No ratings yet
Arima
14 pages
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
No ratings yet
Credit-Card - Notebooks - Preprocessed-Data - Data - Preprocessing - Ipynb at Main Shubhamdongarjal - Credit-Card
15 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Data Preprocessing For Python
No ratings yet
Data Preprocessing For Python
3 pages
CH 6
No ratings yet
CH 6
72 pages
Cluster
100% (1)
Cluster
72 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
No ratings yet
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
7 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
Deep Q-Network
No ratings yet
Deep Q-Network
15 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Discriminant Analysis Assignment
No ratings yet
Discriminant Analysis Assignment
13 pages
Unsupervised Learning and Clustering
No ratings yet
Unsupervised Learning and Clustering
19 pages
SMOTE: Improving Classifier Performance
No ratings yet
SMOTE: Improving Classifier Performance
37 pages
Understanding Cluster Analysis Methods
No ratings yet
Understanding Cluster Analysis Methods
4 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
16 pages
Deep Learning Projects with Python
No ratings yet
Deep Learning Projects with Python
22 pages
NYC Taxi Fare Data Cleaning
100% (1)
NYC Taxi Fare Data Cleaning
8 pages
Amazon Interview Questions Overview
No ratings yet
Amazon Interview Questions Overview
175 pages
945-Article Text-2920-1-10-20190802
No ratings yet
945-Article Text-2920-1-10-20190802
6 pages
Tune Gaussian Mixture Models Guide
No ratings yet
Tune Gaussian Mixture Models Guide
6 pages
Data Mining for CSE Students
No ratings yet
Data Mining for CSE Students
11 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
ML For LTE - Device Readiness Forecasting
100% (1)
ML For LTE - Device Readiness Forecasting
5 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Customer Churn Prediction Strategies
No ratings yet
Customer Churn Prediction Strategies
33 pages
Stochastic Supply Chain Network Design
No ratings yet
Stochastic Supply Chain Network Design
20 pages
Artificial Neural Network
100% (1)
Artificial Neural Network
35 pages
Customer Churn Prediction in Telecommunication
No ratings yet
Customer Churn Prediction in Telecommunication
13 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Anomaly Detection with Gaussian Methods
No ratings yet
Anomaly Detection with Gaussian Methods
11 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Slides
No ratings yet
Slides
92 pages
Data Mining Chapter 4 Association Analysis
No ratings yet
Data Mining Chapter 4 Association Analysis
31 pages
DM Association
No ratings yet
DM Association
43 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Vikram and Betaal Stories
100% (2)
Vikram and Betaal Stories
22 pages
Slides Chap5 KernelMethods
No ratings yet
Slides Chap5 KernelMethods
24 pages
Perl Tutorial PDF
100% (1)
Perl Tutorial PDF
551 pages
Lazy vs. Eager Learning
No ratings yet
Lazy vs. Eager Learning
6 pages
Scan-Line Polygon Fill Algorithms
No ratings yet
Scan-Line Polygon Fill Algorithms
29 pages
Peer To Peer Systems
No ratings yet
Peer To Peer Systems
41 pages
Are Your Clients Overweight? Software Architectures For The Internet Age
No ratings yet
Are Your Clients Overweight? Software Architectures For The Internet Age
31 pages
Benefits and Uses of Giloy Herb
No ratings yet
Benefits and Uses of Giloy Herb
3 pages
Mmsa PDF
No ratings yet
Mmsa PDF
41 pages
Quantum Operating Systems
No ratings yet
Quantum Operating Systems
6 pages
TOC Assignment No-1
No ratings yet
TOC Assignment No-1
5 pages
CNN Acceleration with NTT
No ratings yet
CNN Acceleration with NTT
12 pages
Daa QB Short
No ratings yet
Daa QB Short
11 pages
Chapter 11 Vector Data Analysis: Box 11.1 Riparian Buffer Width
No ratings yet
Chapter 11 Vector Data Analysis: Box 11.1 Riparian Buffer Width
21 pages
Statistics 2 - Power Regression Model Example
100% (1)
Statistics 2 - Power Regression Model Example
3 pages
VLSI Exam Paper for 7th Sem B.E.
No ratings yet
VLSI Exam Paper for 7th Sem B.E.
4 pages
Document Processing
No ratings yet
Document Processing
4 pages
Kakuro Cheat Sheet
100% (1)
Kakuro Cheat Sheet
1 page
Objective 1 Badriyah2020
No ratings yet
Objective 1 Badriyah2020
5 pages
Convex Hulls
No ratings yet
Convex Hulls
27 pages
Amrita Vishwa Vidyapeetham Department of Computer Science and Engineering B.Tech (Cse) Course Plan
No ratings yet
Amrita Vishwa Vidyapeetham Department of Computer Science and Engineering B.Tech (Cse) Course Plan
9 pages
DSA Patterns
No ratings yet
DSA Patterns
5 pages
HMMs for Gene Finding in Bioinformatics
No ratings yet
HMMs for Gene Finding in Bioinformatics
32 pages
Keypoint Detection Techniques
No ratings yet
Keypoint Detection Techniques
13 pages
Application of Artificial Intelligence To Risk PDF
No ratings yet
Application of Artificial Intelligence To Risk PDF
6 pages
Presentation 02
No ratings yet
Presentation 02
25 pages
Enough Coin Flips Can Make Llms Act Bayesian
No ratings yet
Enough Coin Flips Can Make Llms Act Bayesian
20 pages
Certificate Format
No ratings yet
Certificate Format
3 pages
Yr 2022-26, All Subjects Mid Sem 4 Pyqpdf by Himanshu Raj (AIDS )
No ratings yet
Yr 2022-26, All Subjects Mid Sem 4 Pyqpdf by Himanshu Raj (AIDS )
9 pages
Wolfram - A New Kind of Science
No ratings yet
Wolfram - A New Kind of Science
1,197 pages
Preview
No ratings yet
Preview
24 pages
PR 2 Edit Update
No ratings yet
PR 2 Edit Update
4 pages
Gender Detection Using Machine Learning Algorithms
No ratings yet
Gender Detection Using Machine Learning Algorithms
7 pages
Digital 02 00024 v2
No ratings yet
Digital 02 00024 v2
19 pages
Assignment N 2 - FEM Analysis With Bars
No ratings yet
Assignment N 2 - FEM Analysis With Bars
13 pages
Search Algorithm True/False Analysis
No ratings yet
Search Algorithm True/False Analysis
6 pages
Module 2
No ratings yet
Module 2
6 pages
DBMS PPT Module 4 - New
No ratings yet
DBMS PPT Module 4 - New
38 pages

Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth

Uploaded by

Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth

Uploaded by

Association Rule Mining

Algorithms for frequent itemset mining

Motivation: Association Rule Mining

Bread, Diaper, Beer, Eggs

Example of Association Rules

Implication means co-occurrence,

Applications: Association Rule Mining

Definition: Frequent Itemset

Bread, Diaper, Beer, Eggs

Definition: Association Rule

Bread, Diaper, Beer, Eggs

Rule Evaluation Metrics

{Milk, Diaper} Beer

Fraction of transactions that contain

(Milk , Diaper, Beer)

(Milk, Diaper, Beer) 2

Association Rule Mining Task

If d=6, R = 602 rules

Mining Association Rules: Decoupling

Bread, Diaper, Beer, Eggs

If |L| = k, then there are ... candidate

Mining Association Rules

Frequent itemset generation is still

Frequent Itemset Generation

Match each transaction against every candidate

Frequent Itemset Generation Strategies

Reduce the number of transactions (N)

Reduce the number of comparisons (NM)

Reducing Number of Candidates: Apriori

Apriori principle holds due to the following property

Support of an itemset never exceeds the support of its

Illustrating Apriori Principle

Illustrating Apriori Principle

(No need to generate

Local frequent items are determined

For completeness a second scan is done

Multiple scans of database:

ECLAT: Another Method for Frequent Itemset

Vertical Data Layout

ECLAT: Another Method for Frequent Itemset

Advantage: very fast support counting

FP-growth: Another Method for Frequent

After reading TID=1:

Recursively apply FPgrowth on P

Conditional Pattern base

Count for E is 3: {E} is

Recursively apply FPgrowth on P

Conditional pattern base

Recursively apply FPgrowth on P

Conditional pattern base

Count for A is 2: {A,D,E}

Construct conditional tree

Benefits of the FP-tree Structure

Complexity of Association Mining

Dimensionality (number of items) of the data set

Average transaction width

Mining Frequent Patterns With

Compact Representation of Frequent

Need a compact representation

Maximal Frequent Itemset

An itemset is closed if none of its immediate supersets

Maximal vs Closed Frequent Itemsets

Maximal vs Closed Itemsets

Presentation of Association Rules (Table Form)

But confidence of rules generated from the same

c(ABC D) c(AB CD) c(A BCD)

Visualization of Association Rule Using Plane Graph

Visualization of Association Rule Using Rule Graph

You might also like