0% found this document useful (0 votes)
159 views13 pages

Frequent Patterns & Association Rules

This document discusses frequent pattern mining and association rule learning. It defines key concepts like frequent itemsets, support, confidence and association rules. It explains that the number of potential frequent itemsets grows exponentially with the minimum support threshold and the number of items. To address this complexity, the document introduces closed patterns and max-patterns, which provide a lossless compression of the frequent patterns. An example is also given to illustrate closed patterns and max-patterns.

Uploaded by

Anna Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views13 pages

Frequent Patterns & Association Rules

This document discusses frequent pattern mining and association rule learning. It defines key concepts like frequent itemsets, support, confidence and association rules. It explains that the number of potential frequent itemsets grows exponentially with the minimum support threshold and the number of items. To address this complexity, the document introduces closed patterns and max-patterns, which provide a lossless compression of the frequent patterns. An example is also given to illustrate closed patterns and max-patterns.

Uploaded by

Anna Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CS423

DATA WAREHOUSING AND


DATA MINING
Chapter 6a
Frequent Patterns Analysis

Dr. Hammad Afzal

[email protected]

Department of Computer Software Engineering


National University of Sciences and Technology
(NUST)
MINING FREQUENT PATTERNS, ASSOCIATION
AND CORRELATIONS: BASIC CONCEPTS AND
METHODS
 Basic Concepts

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern Evaluation

Methods

 Summary

2
WHAT IS FREQUENT PATTERN
ANALYSIS?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set

 Motivation: Finding inherent regularities in data


 What products were often purchased together?— Milk and diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?

 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
3
analysis, Web log (click stream) analysis, and DNA sequence analysis.
WHAT IS FREQUENT PATTERN
ANALYSIS?
 Frequent pattern: a pattern (a set of items) that occurs frequently
in a data set.
 Milk and Bread

 Frequent pattern: a pattern (subsequences) that occurs frequently


in a data set
 Buying first PC and then digital Camera
 Aik web page pe click kiya tou us k baad kahaan click kiya

 Frequent pattern: a pattern (substructures) that occurs frequently


in a data set .
 Sub-Graphs
4
BASIC CONCEPTS: FREQUENT
PATTERNS

Tid Items bought  itemset: A set of one or more


10 Beer, Nuts, Diaper items
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
 k-itemset X = {x1, …, xk}
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk
 E.g. 2-itemset X = {x1, x2}
Customer Customer
buys both buys diaper

Customer 6
buys beer
BASIC CONCEPTS: FREQUENT
PATTERNS

 (absolute) support, or, support


count of X:
 Frequency or occurrence of an
Tid Items bought itemset X
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper  (relative) support, s,
30 Beer, Diaper, Eggs  is the fraction of transactions
40 Nuts, Eggs, Milk
that contains X (i.e., the
probability that a transaction
50 Nuts, Coffee, Diaper, Eggs, Milk contains X)

 An itemset X is frequent if X’s


support is no less than a
minsup threshold
7
BASIC CONCEPTS: ASSOCIATION RULES

 Find all the rules X  Y with


minimum support and
Ti
confidence
Items bought
d
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper  support, s, probability that a
30 Beer, Diaper, Eggs
transaction contains X  Y
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk

 confidence, c, conditional
probability that a transaction
having X also contains Y
8
BASIC CONCEPTS: ASSOCIATION RULES

 Support (X-> Y) = P (X  Y)

 Confidence(X-> Y) = P (X | Y)

 = Support (X  Y) / Support (X)

9
BASIC CONCEPTS: ASSOCIATION RULES

Ti Items bought Let minsup = 50%, minconf = 50%


d
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper
Freq. Pat.:
30 Beer, Diaper, Eggs
Beer:3, Nuts:3, Diaper:4, Eggs:3,
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk {Beer, Diaper}:3

 Association rules: (many more!)


 Beer  Diaper (60%, 100%)
 Diaper  Beer (60%, 75%)

10
COMPUTATIONAL COMPLEXITY OF FREQUENT ITEMSET
MINING
 How many itemsets are potentially to be generated in the worst case?

 The number of frequent itemsets to be generated is senstive to the


minsup threshold

 When minsup is low, there exist potentially an exponential number of


frequent itemsets

 The worst case: MN where M: # distinct items, and N: max length of


transactions.

 A long pattern contains a combinatorial number of sub-


patterns, e.g., {a1, …, a100} contains
 (1001) + (1002) + … + (110000) = 2100 – 1 = 1.27*1030 sub-
patterns! 11
CLOSED PATTERNS AND MAX-PATTERNS

 Solution: Mine closed patterns and max-patterns instead

 An itemset X is closed if X is frequent and there exists


no super-pattern Y ‫ כ‬X, with the same support as X.

 An itemset X is a max-pattern if X is frequent and


there exists no frequent super-pattern Y ‫ כ‬X

 Closed pattern is a lossless compression of freq.


patterns
12
 Reducing the # of patterns and rules
CLOSED PATTERNS AND MAX-PATTERNS

 Exercise. DB = {<a1, …, a100>, < a1, …, a50>}


 Min_sup = 1.

13
CLOSED PATTERNS AND MAX-PATTERNS

 Exercise. DB = {<a1, …, a100>, < a1, …, a50>}


 Min_sup = 1.

 What is the set of closed itemset?


 <a1, …, a100>: 1
 < a1, …, a50>: 2

 What is the set of max-pattern?


 <a1, …, a100>: 1

14

You might also like