🎓 Lecture Topic: Similarity Learning
1. Introduction
Similarity learning is a machine learning paradigm where the goal is to learn a function that
measures how similar or dissimilar two data samples are.
Instead of predicting class labels directly (like in standard classification), similarity learning
focuses on learning relationships between samples — for example:
● Are these two faces from the same person?
● Are these two X-ray images from the same disease class?
● Are these two products visually similar?
Key Idea:
The model learns to map input data into a space where similar samples are close together,
and dissimilar samples are far apart.
2. Why Similarity Learning?
Traditional classifiers work well when you have a fixed number of classes, but:
● What if new classes appear later?
● What if you have very few samples per class (few-shot learning)?
Similarity learning handles this by learning relationships, not explicit labels.
Example:
In facial recognition, it’s easier to learn “who looks like whom” than to train a classifier for every
person in the world.
3. Core Concept
We aim to learn a similarity function S(xi,xj)S(xi,xj) that gives:
S(xi,xj)={High value (e.g., close to 1)if xi and xj are similarLow value (e.g., close to 0)if xi and xj
are dissimilarS(xi,xj)={High value (e.g., close to 1)Low value (e.g., close to 0)if xiand xjare
similarif xiand xjare dissimilar
This function is often implemented using neural networks (e.g., Siamese, Triplet, or
Contrastive networks).
4. Key Approaches
A. Siamese Network
● Uses two identical subnetworks (with shared weights).
● Each network extracts a feature embedding for one input.
● A distance metric (e.g., Euclidean or cosine distance) compares embeddings.
Loss Function (Contrastive Loss):
L=(1−y)⋅D2+y⋅max(0,m−D)2L=(1−y)⋅D2+y⋅max(0,m−D)2
where
● y=0y=0 if similar, y=1y=1 if dissimilar
● DD = distance between embeddings
● mm = margin separating dissimilar pairs.
Used in: Face verification, signature matching, fingerprint comparison.
B. Triplet Network
● Uses three inputs: anchor, positive, and negative.
○ Anchor (A): reference sample
○ Positive (P): similar to anchor
○ Negative (N): dissimilar to anchor
Triplet Loss:
L=max(0,D(A,P)−D(A,N)+α)L=max(0,D(A,P)−D(A,N)+α)
where αα is a margin.
Goal:
Pull anchor and positive closer, push anchor and negative apart.
Used in: FaceNet (Google’s face recognition model).
C. Contrastive Learning (Self-Supervised)
● Learns similarities without explicit labels.
● Generates positive and negative pairs using data augmentations.
● Examples:
○ SimCLR
○ MoCo
○ BYOL
Intuition:
If two augmented versions of the same image are passed through a model, their embeddings
should be similar.
5. Similarity Metrics
To measure how close two embeddings are:
Metric Formula Interpretation
Euclidean Distance ∥xi−xj∥2∥xi−xj∥2 Geometric distance in feature
space
Cosine Similarity xi⋅xj∥xi∥∥xj∥∥xi Measures angular similarity
∥∥xj∥xi⋅xj
Manhattan Distance ∥xi−xj∥1∥xi−xj∥1 Sum of absolute differences