Recommender
Systems
Using Python
Aug 12, 2016
Slides: https://goo.gl/ehBnhf
Notebook:
https://github.com/dvysardana/RecommenderSys
tems_PyData_2016 Slides by Divya
Outline
1. Why Recommender Systems?
1. Examples of Recommender Systems
1. How to build a Recommender System?
a. Popularity based
b. Classification based
c. Collaborative Filtering
i. Nearest Neighbor
ii. Matrix Factorization
4. Evaluation of Recommender Systems
1. Why Recommender
Systems?
Goal of a Recommender
System: Identify products most
relevant to the user (Eg. Top n
offers).
The long tail phenomenon
2. Some
Examples?
Movie/TV show
Recommendations
Product Recommendations
Friend Recommendations
Job Recommendations
A Naive
understanding of
Recommender
Systems
Users Matching Items
Quiz
What are users and matching items the
following cases:
a.) LinkedIn (Users: members, Items: jobs)
b.) Facebook (Users: members, Items: members)
c.) Amazon (Users: members, Items: products, e.g., books)
d.) Netflix (Users: members, Items: movies, TV shows)
Power of Recommendations: A Success story
“In 1988, a British mountain
climber named Joe Simpson
wrote a book called Touching
the Void, a harrowing account
of near death in the Peruvian
Andes. It got good reviews,
only a modest success, it was
soon forgotten. Then, a
decade later, a strange thing
happened. Jon Krakauer
wrote Into Thin Air, another
book about a mountain-
climbing tragedy, which
became a publishing
Published in 1988 Published in 1996 sensation. Suddently,
Touching the Void started to
sell again.”...The Long Tail by
Chris Anderson
3. Building a
Recommender
System
Solution 0: Popularity based Recommender System
Recommend items viewed/purchased by most people
Recommendations: Ranked list of items by their purchase count
Quiz
Which of the following is true of a popularity
based recommender system?
Can generate Personalized Recommendations?
Can use Context (Eg. time of day)?
Can use User Features?
Can use Item Features?
Can use Purchase History?
Is it Scalable?
Solution 1: Classification Model
Use features of both products as well as users in order to predict
whether a user will like a product or not.
User Features
(Eg. Age, Gender)
Product Features (Eg. Classifier
Limitations.
cost, quality) Like/Not
1. It is difficult to collect
like
high quality information
Purchase History
about products and
users.
Quiz
Which of the following is true of a
Classification model based recommender
system?
Can generate Personalized Recommendations?
Can use Context (Eg. time of day)?
Can use User Features?
Can use Item Features?
Can use Purchase History?
Is it Scalable?
Solution 2: Nearest neighbor Collaborative Filtering
User-based Collaborative Item-based Collaborative
Filtering Filtering
Find users who have Recommend items that are
a similar taste of products similar to the items the user
as the current user. bought.
Similarity is based upon Similarity is based upon
similarity in users’ co-occurence of purcha
purchasing behaviour.
“Items A and B were
“User x is similar to user y purchased by both users
because both purchased x and y, so they are similar
items A, B and C.” Fig. Source:
http://www.salemmarafi.com/code/collaborative-filtering-with-python/
Item-based Collaborative Filtering: An Example
(People who bought this also bought)
History Matrix
B A C
A B C
A C A
B
A
C
Example source: Bob’s Recommendations= [C, B]
https://www.mapr.com/blog/inside-look-at-components-of-recommendation-engine
Item-based Collaborative Filtering: Effect of popular
items
B A C
A B C
A C A
100,000
B
A
C
Item-based Collaborative Filtering: Normalize co-
occurence matrix
Normalize by Popularity
Jaccard similarity
-Number of users common for i and j
A B C
Number of users for either i or j
A
100,000
3 2
100,002
B 3 2
C
3 2
Item-based Collaborative Filtering: Effect of
multiple items
Rows from normalized co-occurence matrix
B A C A B C D
A 0 0.33 1 0.5
A C D 0.25 0.25 0 0.2
Weighted sum=
(Scores for movie A 0.125 0.29 0.5 0.35
A D + Scores for movie D)/2
C D B A
Ranked
Recommendations: 0.5 0.35 0.29 0.125
Quiz
Given a user x itemRatings matrix of size
480,189 x 17,770, which model will you apply
given the matrix is very sparse?
Popularity based recommender system May be
Classification model based recommender system
Item similarity based recommender system
User similarity based recommender system
All of the above
None of the above
17,770
This is the Million Dollar Matrix!!!!$$$$$$!!!!
~100 million ratings
Only 100 million out of
possible 8.5 billion
ratings are non zero.
Very sparse matrix!
Solution 3:
Model based Collaborative Filtering (Matrix Factorization)
Identify latent (hidden) features from the input user x itemRatings matrix to
represent users and items as vectors in N dimensional space.
(Serious/Escapist?) Geared towards Males or Females?
User Vector (u) = [1.3 2.8]
Item Vector (v) = [2.5 -1.9]
New user (Known ratings): [4 5 ….3]
Netflix Prize diagram (Koren et al., 2009)
Solution 3:
Model based Collaborative Filtering (Matrix Factorization)
Training: Use Matrix factorization approaches (Eg. Singular value Decomposition or SVD) to split the
Rating Matrix into constituent User Matrix and Item Matrix with minimum Sum of squared error (SSE).
The winning entry for the famed
Netflix Prize had a number of
SVD models including SVD++
blended
SVD: with Restricted Boltzmann
Anxp= Unxn Snxp VTpxp Machines. Using these
methods they achieved a 10
percent increase in accuracy
over Netflix’s existing algorithm.
--Gower 2014
Goal: Predict unknown ratings for the remaining set of movies using
the learned User Matrix and Item Matrix
● Refer to Gower 2014 to read more about Netflix prize and SVD (Gower, Stephen. "Netflix Prize and SVD." (2014): 1-10.)
Performance Metric for Recommendation Systems
All Recommendations (made on training dataset)
Relevant Items Irrelevant Items that
that are also are
recommended recommended
All
Relevant
Items Precision = # of products relevant & recommended / # of items
(All items Relevant items that
in the recommended
are not
test set) recommendations (Measure of exactness)
Recall = # of products relevant & recommended / # of relevant items
(Measure of completeness)
Performance Metric for Recommendation Systems
Precision Recall Curve: Evaluation of top n
recommendations
Performance Metric for Recommendation Systems
Some other
metrics
Mean Absolute Error
Accuracy
ROC curve
Gunawardana, Asela, and Guy Shani. "A survey of accuracy evaluation metrics of recommendation tasks."
Journal of Machine Learning Research10.Dec (2009): 2935-2962.
Quiz: Comparison of Recommendation Systems
Which recommender model can handle brand new items Cold Start Problem!
(Eg., a new released movie)?
Popularity Classification (Nearest Neighbor- (Matrix Factorization
Based Based based CF) based CF)
Personalized
Recommendations
Uses Context
(Eg. time of day)
User Features
Item Features
Purchase History
Scalable
Can handle brand new
Items?
Music Recommendation
(Python notebook)
Notebook:
https://github.com/dvysardana/Recom
menderSystems_PyData_2016
Short url:
https://goo.gl/kVnNKf
Resources
1. Book: Recommender Systems An Introduction by Dietmar Jannach
1. Book: Mining Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff
Ullman (www.mmds.org)
1. Coursera course on Recommender Systems, by University of Washington
1. Coursera course on Recommender Systems, by University of Minnesota
Do you have
any questions?