|
Data Mining Techniques So Far«
Chapter 5 ± Statistics
Chapter 6 ± Decision Trees
Chapter 7 ± Neural Networks
Chapter 8 ± Nearest Neighbor Approaches: Memory-
Based Reasoning and Collaborative Filtering
2
hat can be inferred?
I purchase diapers
I purchase a new car
I purchase OTC cough medicine
I purchase a prescription medication
I don¶t show up for class
ã
Market Basket Analysis
Retail ± each customer purchases different set of
products, different quantities, different times
MBA uses this information to:
± Identify who customers are (not by name)
± Understand why they make certain purchases
± Gain insight about its merchandise (products):
Fast and slow movers
Products which are purchased together
Products which might benefit from promotion
± Take action:
Store layouts
hich products to put on specials, promote, coupons«
Combining all of this with a customer loyalty card it
becomes even more valuable
ß
Association Rules
DM technique most closely allied with
Market Basket Analysis
AR can be automatically generated
± AR represent patterns in the data without a
specified target variable
± Good example of undirected data mining
± hether patterns make sense is up to
humanoids (us!)
5
Association Rules Apply Elsewhere
Besides retail ± supermarkets, etc«
Purchases made using credit/debit cards
Optional Telco Service purchases
Banking services
Unusual combinations of insurance claims
can be a warning of fraud
Medical patient histories
6
Market Basket Analysis Drill-Down
MBA is a set of techniques, Association
Rules being most common, that focus on
point-of-sale (p-o-s) transaction data
ã types of market basket data (p-o-s data)
± Customers
± Orders (basic purchase data)
± Items (merchandise/services purchased)
7
Lots of questions can be answered
± Avg # of orders/customer
± Avg # unique items/order
± Avg # of items/order
± For a product
hat % of customers have purchased
Transaction Data
Avg # orders/customer include it
Avg quantity of it purchased/order
± Etc«
Visualization is extremely helpful«next slide
8
Sales Order Characteristics
à
Sales Order Characteristics
Did the order use gift wrap?
Billing address same as Shipping address?
Did purchaser accept/decline a cross-sell?
hat is the most common item found on a one-item
order?
hat is the most common item found on a multi-item
order?
hat is the most common item for repeat customer
purchases?
How has ordering of an item changed over time?
How does the ordering of an item vary geographically?
Yada«yada«yada«
¬
Pivoting for Cluster Algorithms
¬¬
Association Rules
al-Mart customers who purchase Barbie dolls
have a 6 % likelihood of also purchasing one of
three types of candy bars [r , Sept 8, ¬àà7]
Customers who purchase maintenance
agreements are very likely to purchase large
appliances (author experience)
hen a new hardware store opens, one of the
most commonly sold items is toilet bowl cleaners
(author experience)
So what«
¬2
Association Rules
Association rule types:
± Actionable Rules ± contain high-quality,
actionable information
± Trivial Rules ± information already well-
known by those familiar with the business
± Inexplicable Rules ± no explanation and do
not suggest action
Trivial and Inexplicable Rules occur most
often
‹
How Good is an Association Rule?
Customer Items Purchased
¬ OJ, soda POS Transactions
2 Milk, OJ, window cleaner
ã OJ, detergent
ß OJ, detergent, soda Co-occurrence of
5 indow cleaner, soda Products
OJ indow Milk Soda Detergent
cleaner
OJ ß ¬ ¬ 2 2
indow cleaner ¬ 2 ¬ ¬
Milk ¬ ¬ ¬
Soda 2 ¬ ã ¬
Detergent 2 ¬ 2
§
How Good is an Association Rule?
OJ indow Milk Soda Detergent
cleaner
OJ ß ¬ ¬ 2 2
indow cleaner ¬ 2 ¬ ¬
Milk ¬ ¬ ¬
Soda 2 ¬ ã ¬
Detergent 2 ¬ 2
Simple patterns:
¬ OJ and soda are more likely purchased together than
any other two items
2 Detergent is never purchased with milk or window cleaner
ã Milk is never purchased with soda or detergent
¬5
How Good is an Association Rule?
Customer Items Purchased
¬ OJ, soda POS Transactions
2 Milk, OJ, window cleaner
ã OJ, detergent
ß OJ, detergent, soda
5 indow cleaner, soda
hat is the confidence for this rule:
± If a customer purchases soda, then customer also purchases OJ
± 2 out of ã soda purchases also include OJ, so 67%
hat about the confidence of this rule reversed?
± 2 out of ß OJ purchases also include soda, so 5 %
|
= Ratio of the number of transactions with all the items
to the number of transactions with just the ³if´ items
¬6
How Good is an Association Rule?
How much better than chance is a rule?
Lift (improvement) tells us how much better a rule is at predicting the
result than just assuming the result in the first place
mis the ratio of the records that support the entire rule to the
number that would be expected, assuming there was no relationship
between the products
Calculating lift«p 㬠« hen lift > ¬ then the rule is better at
predicting the result than guessing
hen lift < ¬, the rule is doing worse than informed guessing and
using the produces a better rule than guessing
Co-occurrence can occur in ã, ß, or more dimensions«
¬7
Creating Association Rules
¬ Choosing the right set of
items
2 Generating rules by
deciphering the counts in
the co-occurrence matrix
ã Overcoming the practical
limits imposed by
thousands or tens of
thousands of unique
items
¬8
Overcoming Practical Limits for
Association Rules
¬ Generate co-occurrence matrix for single
items«´`
2 Generate co-occurrence matrix for two
items«´` `
ã Generate co-occurrence matrix for three
items«´` ` `
Cleaner´ then soda
ß Etc«
ˆ
Final Thought on Association Rules:
The Problem of Lots of Data
Fast Food Restaurant«could have ¬ items on
its menu
± How many combinations are there with ã different
menu items? ¬6¬,7 !
Supermarket«¬ , or more unique items
± 5 million 2-item combinations
± ¬ billion ã-item combinations
Use of product hierarchies (groupings) helps
address this common issue
Finally, know that the number of transactions in
a given time-period could also be huge (hence
expensive to analyze)
2
End of Chapter à
2¬