Mining Frequent Item sets without
Candidate Generation
Apriori with candidate generation is costly for two reasons:
1. It may need to generate a huge number of candidate sets.
For Example : if there are 104 frequent 1-itemsets, the Apriori
algorithm will need to generate more than 107 candidate 2-itemsets.
2. It is costly to go over each transaction in the database to determine the
support of the candidate item sets
December 7, 2021 Data Mining: Concepts and Techniques 1
Mining Frequent Itemsets without
Candidate Generation
“Can we design a method that mines the complete set of frequent itemsets
without candidate generation?”
FP-growth (frequent-pattern growth,): adopts a divide-and-conquer strategy
as follows :
1. First, it compresses the database representing frequent items into a
frequent-pattern tree, or FP-tree
2. It then divides the compressed database into a set of conditional databases
,each associated with one frequent item and mines each such database
separately.
December 7, 2021 Data Mining: Concepts and Techniques 2
Mining Frequent Itemsets without
Candidate Generation
FP-growth : Example
We re-examine the mining of transaction database, D
December 7, 2021 Data Mining: Concepts and Techniques 3
Mining Frequent Itemsets without
Candidate Generation
FP-growth : Example
We re-examine the mining of transaction database, D
The first scan of the database is the same as Apriori. Let the min_sup = 2
Here the set of frequent items is sorted in the order of descending
support count. we have L ={{I2: 7}, {I1: 6}, {I3: 6}, {I4: 2}, {I5: 2}}
Sorted itemsets
We say that the
items are in L - order
December 7, 2021 Data Mining: Concepts and Techniques 4
Mining Frequent Itemsets without
Candidate Generation
FP-growth : Example (Constructing FP – tree)
December 7, 2021 Data Mining: Concepts and Techniques 5
Mining Frequent Itemsets without
Candidate Generation
FP-growth : Example (Constructing conditional databases)
Next FP-tree is mined as follows :
1. Starting with last item in table and constructing its conditional pattern
base : A “subdatabase” which consists of the set of prefix paths in the FP-tree co-
occurring with the suffix pattern
2. Construct conditional FP-tree
3. Frequent itemsets are found by the concatenation of the suffix pattern with the
frequent patterns generated from a conditional FP-tree
December 7, 2021 Data Mining: Concepts and Techniques 6
Mining Frequent Itemsets without
Candidate Generation
FP-growth : Example (Constructing conditional databases)
December 7, 2021 Data Mining: Concepts and Techniques 7
2. Example