0% found this document useful (0 votes)
23 views39 pages

Week2 1

Uploaded by

Rıza Erdinç
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views39 pages

Week2 1

Uploaded by

Rıza Erdinç
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EE 470

Nearest Neighbor Classifier

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 2 April 1, 2021


First classifier: Nearest Neighbor

Memorize all
data and labels

Predict the label


of the most similar
training image

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 3 April 1, 2021


First classifier: Nearest Neighbor
?

deer bird plane cat car

Training data with labels

query data

Distance Metric

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 4 April 1, 2021


Nearest Neighbor classifier

Memorize training data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 5 April 1, 2021


Nearest Neighbor classifier

For each test image:


Find closest train image
Predict label of nearest image

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 6 April 1, 2021


K-Nearest Neighbors: Distance Metric

L1 (Manhattan) distance L2 (Euclidean) distance

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 7 April 1, 2021


Hyperparameters

What is the best value of k to use?


What is the best distance to use?

These are hyperparameters: choices about


the algorithms themselves.

Very problem/dataset-dependent.
Must try them all out and see what works best.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 8 April 1, 2021


Setting Hyperparameters
Idea #1: Choose hyperparameters
that work best on the training data
train

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 9 April 1, 2021


Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 10 April 1, 2021


Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train

Idea #2: choose hyperparameters


that work best on test data
train test

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 11 April 1, 2021


Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train

Idea #2: choose hyperparameters BAD: No idea how algorithm


that work best on test data will perform on new data
train test

Never do this!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 12 April 1, 2021


Setting Hyperparameters
Idea #1: Choose hyperparameters BAD: K = 1 always works
that work best on the training data perfectly on training data
train

Idea #2: choose hyperparameters BAD: No idea how algorithm


that work best on test data will perform on new data
train test

Idea #3: Split data into train, val; choose Better!


hyperparameters on val and evaluate on test
train validation test

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 13 April 1, 2021


Setting Hyperparameters
train

Idea #4: Cross-Validation: Split data into folds,


try each fold as validation and average the results

fold 1 fold 2 fold 3 fold 4 fold 5 test

fold 1 fold 2 fold 3 fold 4 fold 5 test

fold 1 fold 2 fold 3 fold 4 fold 5 test

Useful for small datasets, but not used too frequently in deep learning

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 14 April 1, 2021


Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images

Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 15 April 1, 2021


Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images Test images and nearest neighbors

Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 16 April 1, 2021


Setting Hyperparameters Example of
5-fold cross-validation
for the value of k.

Each point: single


outcome.

The line goes


through the mean, bars
indicated standard
deviation

(Seems that k ~= 7 works best


for this data)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 17 April 1, 2021


What does this look like?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 18 April 1, 2021


What does this look like?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 19 April 1, 2021


k-Nearest Neighbor with pixel distance never used.

- Distance metrics on pixels are not informative


- Very slow at test time
Original Occluded Shifted (1 pixel) Tinted

Original image is
CC0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 20 April 1, 2021


k-Nearest Neighbor with pixel distance never used.
Dimensions = 3
- Curse of dimensionality Points = 43

Dimensions = 2
Points = 42

Dimensions = 1
Points = 4

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 21 April 1, 2021


K-Nearest Neighbors: Summary
• In image classification we start with a training set of images and labels, and must
predict labels on the test set

• The K-Nearest Neighbors classifier predicts labels based on the K nearest training
examples
• Distance metric and K are hyperparameters

• Choose hyperparameters using the validation set;

• Only run on the test set once at the very end

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 22 April 1, 2021


Linear Classifier

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 23 April 1, 2021


Parametric Approach

Image

10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 24 April 1, 2021


Parametric Approach: Linear Classifier

Image
f(x,W) = Wx
10 numbers giving
f(x,W) class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 25 April 1, 2021


Parametric Approach: Linear Classifier
3072x1

Image f(x,W) = Wx
10x1 10x3072
10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 26 April 1, 2021


Parametric Approach: Linear Classifier
3072x1

Image f(x,W) = Wx + b 10x1


10x1 10x3072
10 numbers giving
f(x,W)
class scores
Array of 32x32x3 numbers
(3072 numbers total)
W
parameters
or weights

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 27 April 1, 2021


Neural Network

Linear
classifiers

This image is CC0 1.0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 28 April 1, 2021


[Krizhevsky et al. 2012] Linear layers

[He et al. 2015]

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 29 April 1, 2021


Recall CIFAR10

50,000 training images


each image is 32x32x3

10,000 test images.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 30 April 1, 2021


Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Flatten tensors into a vector

56

56 231
231

24 2
24

Input image 2

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 31 April 1, 2021


Example with an image with 4 pixels, and 3 classes (cat/dog/ship)

Flatten tensors into a vector

56
0.2 -0.5 0.1 2.0 1.1 -96.8
56 231 Cat score
231
1.5 1.3 2.1 0.0 3.2 437.9
24 2
24 + = Dog score
0 0.25 0.2 -0.3 -1.2 61.95
Ship score
Input image 2

W b

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 32 April 1, 2021


Interpreting a Linear Classifier

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 33 April 1, 2021


Interpreting a Linear Classifier: Visual Viewpoint

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 34 April 1, 2021


Interpreting a Linear Classifier: Geometric Viewpoint

f(x,W) = Wx + b

Array of 32x32x3 numbers


(3072 numbers total)

Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 35 April 1, 2021


Hard cases for a linear classifier
Class 1: Class 1: Class 1:
First and third quadrants 1 <= L2 norm <= 2 Three modes
Class 2: Class 2:
Class 2:
Everything else Everything else
Second and fourth quadrants

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 36 April 1, 2021


f(x,W) = Wx + b
Coming up:
(quantifying what it means to
- Loss function have a “good” W)

- Optimization (start with random W and find a


W that minimizes the loss)
- ConvNets! (tweak the functional form of f)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 2 - 37 April 1, 2021

You might also like