EE 470
Nearest Neighbor Classifier
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 2 April 1, 2021
First classifier: Nearest Neighbor
Memorize all
data and labels
Predict the label
of the most similar
training image
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 3 April 1, 2021
First classifier: Nearest Neighbor
?
deer bird plane cat car
Training data with labels
query data
Distance Metric
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 4 April 1, 2021
Nearest Neighbor classifier
Memorize training data
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 5 April 1, 2021
Nearest Neighbor classifier
For each test image:
Find closest train image
Predict label of nearest image
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 6 April 1, 2021
K-Nearest Neighbors: Distance Metric
L1 (Manhattan) distance L2 (Euclidean) distance
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 7 April 1, 2021
Hyperparameters
What is the best value of k to use?
What is the best distance to use?
These are hyperparameters: choices about
the algorithms themselves.
Very problem/dataset-dependent.
Must try them all out and see what works best.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 8 April 1, 2021
Setting Hyperparameters
Idea #1: Choose hyperparameters
that work best on the training data
train
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 9 April 1, 2021
Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 10 April 1, 2021
Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train
Idea #2: choose hyperparameters
that work best on test data
train test
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 11 April 1, 2021
Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train
Idea #2: choose hyperparameters BAD: No idea how algorithm
that work best on test data will perform on new data
train test
Never do this!
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 12 April 1, 2021
Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train
Idea #2: choose hyperparameters BAD: No idea how algorithm
that work best on test data will perform on new data
train test
Idea #3: Split data into train, val; choose Better!
hyperparameters on val and evaluate on test
train validation test
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 13 April 1, 2021
Setting Hyperparameters
train
Idea #4: Cross-Validation: Split data into folds,
try each fold as validation and average the results
fold 1 fold 2 fold 3 fold 4 fold 5 test
fold 1 fold 2 fold 3 fold 4 fold 5 test
fold 1 fold 2 fold 3 fold 4 fold 5 test
Useful for small datasets, but not used too frequently in deep learning
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 14 April 1, 2021
Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images
Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 15 April 1, 2021
Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images Test images and nearest neighbors
Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 16 April 1, 2021
Setting Hyperparameters Example of
5-fold cross-validation
for the value of k.
Each point: single
outcome.
The line goes
through the mean, bars
indicated standard
deviation
(Seems that k ~= 7 works best
for this data)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 17 April 1, 2021
What does this look like?
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 18 April 1, 2021
What does this look like?
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 19 April 1, 2021
k-Nearest Neighbor with pixel distance never used.
- Distance metrics on pixels are not informative
- Very slow at test time
Original Occluded Shifted (1 pixel) Tinted
Original image is
CC0 public domain
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 20 April 1, 2021
k-Nearest Neighbor with pixel distance never used.
Dimensions = 3
- Curse of dimensionality Points = 43
Dimensions = 2
Points = 42
Dimensions = 1
Points = 4
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 21 April 1, 2021
K-Nearest Neighbors: Summary
• In image classification we start with a training set of images and labels, and must
predict labels on the test set
• The K-Nearest Neighbors classifier predicts labels based on the K nearest training
examples
• Distance metric and K are hyperparameters
• Choose hyperparameters using the validation set;
• Only run on the test set once at the very end
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 22 April 1, 2021
Linear Classifier
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 23 April 1, 2021
Parametric Approach
Image
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 24 April 1, 2021
Parametric Approach: Linear Classifier
Image
f(x,W) = Wx
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 25 April 1, 2021
Parametric Approach: Linear Classifier
3072x1
Image f(x,W) = Wx
10x1 10x3072
10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 26 April 1, 2021
Parametric Approach: Linear Classifier
3072x1
Image f(x,W) = Wx + b 10x1
10x1 10x3072
10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 27 April 1, 2021
Neural Network
Linear
classifiers
This image is CC0 1.0 public domain
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 28 April 1, 2021
[Krizhevsky et al. 2012] Linear layers
[He et al. 2015]
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 29 April 1, 2021
Recall CIFAR10
50,000 training images
each image is 32x32x3
10,000 test images.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 30 April 1, 2021
Example with an image with 4 pixels, and 3 classes (cat/dog/ship)
Flatten tensors into a vector
56
56 231
231
24 2
24
Input image 2
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 31 April 1, 2021
Example with an image with 4 pixels, and 3 classes (cat/dog/ship)
Flatten tensors into a vector
56
0.2 -0.5 0.1 2.0 1.1 -96.8
56 231 Cat score
231
1.5 1.3 2.1 0.0 3.2 437.9
24 2
24 + = Dog score
0 0.25 0.2 -0.3 -1.2 61.95
Ship score
Input image 2
W b
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 32 April 1, 2021
Interpreting a Linear Classifier
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 33 April 1, 2021
Interpreting a Linear Classifier: Visual Viewpoint
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 34 April 1, 2021
Interpreting a Linear Classifier: Geometric Viewpoint
f(x,W) = Wx + b
Array of 32x32x3 numbers
(3072 numbers total)
Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 35 April 1, 2021
Hard cases for a linear classifier
Class 1: Class 1: Class 1:
First and third quadrants 1 <= L2 norm <= 2 Three modes
Class 2: Class 2:
Class 2:
Everything else Everything else
Second and fourth quadrants
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 36 April 1, 2021
f(x,W) = Wx + b
Coming up:
(quantifying what it means to
- Loss function have a “good” W)
- Optimization (start with random W and find a
W that minimizes the loss)
- ConvNets! (tweak the functional form of f)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 37 April 1, 2021