0% found this document useful (0 votes)

112 views16 pages

Lecture 04 - Supervised Learning by Computing Distances (2) - Plain

1) Learning by Computing Distances discusses Learning with Prototypes (LwP) and nearest neighbors algorithms. LwP predicts the class of a test point based on which prototype vector (class mean) it is closest to. 2) Nearest neighbors is another supervised learning technique that computes the distance between a test point and all training points, finds the nearest neighbors, and predicts the majority class of the neighbors. 3) Both LwP and nearest neighbors can be improved by using weighted distances like Mahalanobis distance to better model complex class shapes. LwP is also used as a subroutine in other algorithms like K-means clustering.

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views16 pages

Lecture 04 - Supervised Learning by Computing Distances (2) - Plain

Uploaded by

Raja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Learning by Computing Distances (2):

Wrapping-up LwP, Nearest Neighbors

CS771: Introduction to Machine Learning

Piyush Rai
2
Learning with Prototypes (LwP)
1 𝜇
∑ 𝐱
1
𝜇− = 𝜇−
+¿=
∑
¿
𝑁 − 𝑦 =−1 𝑛
𝑛
𝐰 𝜇+¿¿ 𝑁+¿
𝑦 𝑛=+1
𝐱𝑛 ¿

Prediction rule for LwP

𝐰= 𝝁+¿ −𝝁 − ¿
(for binary classification
If Euclidean distance used
with Euclidean distance)

+ For LwP, the prototype vectors (or

Decision boundary their difference) define the “model”.
(> 0 then predict +1 otherwise -1) and (or just in the Euclidean distance
(perpendicular bisector of line
case) are the model parameters.
joining the class prototype vectors)
Exercise: Show that for the bin. classfn case
𝑁 Note: Even though can be expressed in Can throw away training data after computing the
𝑓 ( 𝐱 )= ∑ 𝛼𝑛 ⟨ 𝐱𝑛 , 𝐱 ⟩ +𝑏 this form, if N > D, this may be more
expensive to compute (O(N) time)as
prototypes and just need to keep the model parameters
𝑛=1 for the test time in such “parametric” models
compared to (O(D) time).

So the “score” of a test point is a weighted sum of its

similarities with each of the N training inputs. Many However the form is still very useful as we will see later
supervised learning models have in this form as we will see when we discuss kernel methods
later
CS771: Intro to ML
3
Improving LwP when classes are complex-shaped
 Using weighted Euclidean or Mahalanobis distance can sometimes help
𝜇+¿¿ √
𝐷
𝑑 𝑤 ( 𝒂 , 𝒃 )= ∑ 𝑤 𝑖 ( 𝑎𝑖 − 𝑏𝑖 )2
𝜇− 𝑖=1

Use a smaller for the horizontal

axis feature in this example

 Note: Mahalanobis distance also has the effect of rotating the axes which helps
A good W will help bring
W will be a 2x2 symmetric matrix in points from same class
this case (chosen by us or learned) closer and move different
classes apart
𝑑 𝑤 ( 𝒂 , 𝒃 )=√ ( 𝒂 − 𝒃 ) 𝐖 ( 𝒂 − 𝒃 )
⊤

𝜇− 𝜇+¿¿
𝜇+¿¿ 𝜇−

CS771: Intro to ML
4
Improving LwP when classes are complex-shaped
 Even with weighted Euclidean or Mahalanobis dist, LwP still a linear classifier

 Exercise: Prove the above fact. You may use the following hint
 Mahalanobis dist can be written as
 is a symmetric matrix and thus can be written as for any matrix
 Showing for Mahalabonis is enough. Weighted Euclidean is a special case with diag

 LwP can be extended to learn nonlinear decision boundaries if we use

nonlinear distances/similarities(more on this Note:
when we talk
Modeling each about kernels)
class by not
just a mean by a probability
distribution can also help in learning
nonlinear decision boundaries. More
on this when we discuss
probabilistic models for
classification
CS771: Intro to ML
5
LwP as a subroutine in other ML models
 For data-clustering (unsupervised learning), K-means clustering is a popular algo

 K-means also computes means/centres/prototypes of groups of unlabeled points

 Harder than LwP since labels are unknown. But we can do the following
 Guess the label of each point, compute means using guess labels Will see K-means
in detail later
 Refine labels using these means (assign each point to the current closest mean)
 Repeat until means don’t change anymore
 Many other models also use LwP as a subroutine
CS771: Intro to ML
6

Supervised Learning
using
Nearest Neighbors

CS771: Intro to ML
7
Nearest Neighbors
 Another supervised learning technique based on computing distances
Wait. Did you say distance from
ALL the training points? That’s
 Very simple idea. Simply do the following at test time gonna be sooooo expensive! 
 Compute distance of of the test point from all the training points
 Sort the distances to find the “nearest” input(s) in training data Yes, but let’s not worry
about that at the moment.
 Predict the label using majority or avg label of these inputs There are ways to speed
up this step

 Can use Euclidean or other dist (e.g., Mahalanobis). Choice imp just like LwP

 Unlike LwP which does prototype based comparison, nearest neighbors

method looks at the labels of individual training inputs to make prediction

 Applicable to both classifn as well as regression (LwP only works for classifn)
CS771: Intro to ML
8

Nearest Neighbors for Classification

CS771: Intro to ML
9
Nearest Neighbor (or “One” Nearest Neighbor)
Decision boundary Interesting. Even with
Euclidean distances, it can learn
nonlinear decision boundaries?

Indeed. And that’s

possible since it is a
“local” method (looks at
a local neighborhood of
the test point to make
prediction)

Nearest neighbour approach

Test point induces a Voronoi
Test point
tessellation/partition of the input
space (all test points falling in a cell
will get the label of the training
input in that cell)
CS771: Intro to ML
10
K Nearest Neighbors (KNN)
 In many cases, it helps to look at not one but > 1 nearest neighbors

Test input = 31
How to pick the
“right” K
value?

K is this model’s
“hyperparameter”. One
way to choose it is using
“cross-validation” (will
see shortly)
Also, K should ideally be
an odd number to avoid
 Essentially, taking more votes helps! ties
 Also leads to smoother decision boundaries (less chances of overfitting on training data)

CS771: Intro to ML
11
-Ball Nearest Neighbors (-NN)
 Rather than looking at a fixed number of neighbors, can look inside a ball of a
given radius , around the test input So changing may change
the prediction. How to
pick the “right” value?

Test input

Just like K, is also a

“hyperparameter”. One
way to choose it is using
“cross-validation” (will
see shortly)

CS771: Intro to ML
12
Distance-weighted KNN and -NN
 The standard KNN and 𝜖-NN treat all nearest neighbors equally (all vote
equally)
=3
Test input

 An improvement: When 1 voting,

1 give
1 more importance to closer
In weighted approach, training
a single red training inputs
Unweighted KNN prediction:
3
+ 3 +3 = input is being given 3 times more
importance than the other two green inputs
3 1 1 since it is sort of “three times” closer to the
Weighted KNN prediction:
5
+ 5
+5 = test input than the other two green inputs
-NN can also be made
CS771:
weighted Intro to ML
likewise
13
KNN/-NN for Other Supervised Learning Problems
 Can apply KNN/𝜖-NN for other supervised learning problems as well, such as
 Multi-class classification We can also try the weighted versions for
such problems, just like we did in the case
 Regression of binary classification
 Tagging/multi-label learning

 For multi-class, simply used the same majority rule like in binary classfn case
 Just a simple difference that now we have more than 2 classes

 For regression, simply compute the average of the outputs of nearest neighbors

 For multi-label learning, each output is a binary vector (presence/absence of

tag)
 Just compute the average of the binary vectors
CS771: Intro to ML
14
KNN Prediction Rule: The Mathematical Form
 Let’s denote the set of K nearest neighbors of an input by

 The unweighted KNN prediction for a test input can be written as

Assuming discrete labels with 5 possible
1
∑
values, the one-hot representation will be a all
𝐲= 𝐲𝑖 zeros vector of size 5, except a single 1
𝐾 𝑖 ∈𝑁 (𝐱)
𝐾
denoting the value of the discrete label, e.g., if
label = 3 then one-hot vector = [0,0,1,0,0]

 This form makes direct sense of regression and for cases where the each output
is a vector (e.g., multi-class classification where each output is a discrete value
which can be represented as a one-hot vector, or tagging/multi-label
classification where each output is a binary vector)
 For binary classification, assuming labels as +1/-1, we predict )

CS771: Intro to ML
15
Nearest Neighbors: Some Comments
 An old, classic but still very widely used algorithm
 Can sometimes give deep neural networks a run for their money 
 Can work very well in practical with the right distance function
 Comes with very nice theoretical guarantees
 Also called a memory-based or instance-based or non-parametric method
 No “model” is learned here (unlike LwP). Prediction step uses all the training data
 Requires lots of storage (need to keep all the training data at test time)
 Prediction step can be slow at test time
 For each test point, need to compute its distance from all the training points
 Clever data-structures or data-summarization techniques can provide speed-ups

CS771: Intro to ML
16
Next Lecture
 Hyperparameter/model selection via cross-validation
 Learning with Decision Trees

CS771: Intro to ML

Lecture 3
No ratings yet
Lecture 3
21 pages
4 Nearest Neighbors
No ratings yet
4 Nearest Neighbors
13 pages
Supervised Learning with Prototypes
No ratings yet
Supervised Learning with Prototypes
22 pages
Lecture 03 - Supervised Learning by Computing Distances - Plain
No ratings yet
Lecture 03 - Supervised Learning by Computing Distances - Plain
17 pages
ML Classification for CS Students
No ratings yet
ML Classification for CS Students
49 pages
04-05-Supervised Learning by Computing Distances
No ratings yet
04-05-Supervised Learning by Computing Distances
16 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
No ratings yet
Warming-Up To ML, and Some Simple Supervised Learners (Distance-Based "Local" Methods)
29 pages
Instance-Based Learning Explained
No ratings yet
Instance-Based Learning Explained
6 pages
Pks Machine Learning Module 3 1
No ratings yet
Pks Machine Learning Module 3 1
62 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Module 3-1
No ratings yet
Module 3-1
46 pages
Difference Between Instance-And Model-Based Learning
No ratings yet
Difference Between Instance-And Model-Based Learning
35 pages
Distance-Based Learning Explained
No ratings yet
Distance-Based Learning Explained
67 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
AI&ML Module 3
No ratings yet
AI&ML Module 3
96 pages
Distance Metric Learning For Large Margin Nearest Neighbor Classification
No ratings yet
Distance Metric Learning For Large Margin Nearest Neighbor Classification
8 pages
Nearest Neighbor Regression: Find Training Datum Closest To Predict
No ratings yet
Nearest Neighbor Regression: Find Training Datum Closest To Predict
37 pages
ML Quiz Guidelines for Students
No ratings yet
ML Quiz Guidelines for Students
22 pages
3a KNN PDF
No ratings yet
3a KNN PDF
26 pages
K-NN Algorithm Overview
No ratings yet
K-NN Algorithm Overview
8 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Topic 7.7 K-Nearest Neighbor Analysis
No ratings yet
Topic 7.7 K-Nearest Neighbor Analysis
5 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Understanding Linear Models in ML
No ratings yet
Understanding Linear Models in ML
8 pages
CS8082U4L01 - K-Nearest Neighbour Learning
No ratings yet
CS8082U4L01 - K-Nearest Neighbour Learning
21 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Nearest-Neighbor Classifier Guide
No ratings yet
Nearest-Neighbor Classifier Guide
2 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
02-knn Slides
No ratings yet
02-knn Slides
57 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Unit 5 ML
No ratings yet
Unit 5 ML
13 pages
Lect 06
No ratings yet
Lect 06
26 pages
Lecture 02 - KNN and ML Basics
No ratings yet
Lecture 02 - KNN and ML Basics
33 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
ML 4
No ratings yet
ML 4
33 pages
Deep Learning in Healthcare Applications
No ratings yet
Deep Learning in Healthcare Applications
68 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Lec 02
No ratings yet
Lec 02
27 pages
MLT Unit 3 Part 2
No ratings yet
MLT Unit 3 Part 2
57 pages
Lecture 2
No ratings yet
Lecture 2
26 pages
Unit 3
No ratings yet
Unit 3
100 pages
3 - Learning With Prototypes
No ratings yet
3 - Learning With Prototypes
17 pages
Siddu AIml
No ratings yet
Siddu AIml
8 pages
07 CSE358 Intro To Machine Learning I
No ratings yet
07 CSE358 Intro To Machine Learning I
63 pages
Bcs602 ML Mod-3 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-3 Notes @vtunetwork
28 pages
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
25 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Lec 5
No ratings yet
Lec 5
20 pages
Machine Learning Homework Solutions
No ratings yet
Machine Learning Homework Solutions
8 pages
Lecture 07 Slides
No ratings yet
Lecture 07 Slides
45 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
74 pages
K-Nearest Neighbors (KNN) Algorithm: Dr. Nagaraju K, CSE
No ratings yet
K-Nearest Neighbors (KNN) Algorithm: Dr. Nagaraju K, CSE
24 pages
ML - Module 3 - Chapter 4 RNSIT
No ratings yet
ML - Module 3 - Chapter 4 RNSIT
5 pages
RBF Networks and KNN Overview
No ratings yet
RBF Networks and KNN Overview
9 pages
جدول الاختبارات 2023-2024 الفصل الاول - مع القاعات - ٠٦١١٣٥
No ratings yet
جدول الاختبارات 2023-2024 الفصل الاول - مع القاعات - ٠٦١١٣٥
36 pages
Deep Learning
100% (1)
Deep Learning
189 pages
Optimization Techniques for ML
No ratings yet
Optimization Techniques for ML
15 pages
Bernd Klein Python and Machine Learning Letter
No ratings yet
Bernd Klein Python and Machine Learning Letter
453 pages
Intro to Machine Learning Lecture
No ratings yet
Intro to Machine Learning Lecture
23 pages
Geometric Modeling of Occam's Razor in DL
No ratings yet
Geometric Modeling of Occam's Razor in DL
93 pages
Model Training: (Anything Done While We Train The Model)
No ratings yet
Model Training: (Anything Done While We Train The Model)
194 pages
Bernd Klein Python Data Analysis Letter
100% (1)
Bernd Klein Python Data Analysis Letter
514 pages
Dataset: (Most Famous)
No ratings yet
Dataset: (Most Famous)
8 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
50 pages
Command Line Python Scripting: Takeaways: Syntax
No ratings yet
Command Line Python Scripting: Takeaways: Syntax
2 pages
A B Testing
100% (1)
A B Testing
28 pages
Working With Programs: Takeaways: Syntax
No ratings yet
Working With Programs: Takeaways: Syntax
2 pages
SEL-3530 RTAC and HMI Setup Guide - 20111004
No ratings yet
SEL-3530 RTAC and HMI Setup Guide - 20111004
12 pages
Impact of Artificial Intelligence On The Planning and Operation of Distributed Energy Systems in Smart Grids
No ratings yet
Impact of Artificial Intelligence On The Planning and Operation of Distributed Energy Systems in Smart Grids
22 pages
Mater Redemptoris Collegium, Inc. Digital and Social Media Exposure Plan
No ratings yet
Mater Redemptoris Collegium, Inc. Digital and Social Media Exposure Plan
2 pages
2013HW70753-EndSemReport-Sagar Agrawal
No ratings yet
2013HW70753-EndSemReport-Sagar Agrawal
56 pages
Trigonometric Functions of Acute Angles
No ratings yet
Trigonometric Functions of Acute Angles
9 pages
Language and Education Research Syllabus
No ratings yet
Language and Education Research Syllabus
12 pages
Math 10 - Week 1 - PPT
100% (1)
Math 10 - Week 1 - PPT
20 pages
Adsaa Unit 1
No ratings yet
Adsaa Unit 1
20 pages
Vehicle Security with AI Recognition
No ratings yet
Vehicle Security with AI Recognition
10 pages
Batch2 Fulldoc
No ratings yet
Batch2 Fulldoc
21 pages
Ai&Ml Lab
No ratings yet
Ai&Ml Lab
63 pages
PYTHON100-MOD6-CODING ERRORS-V2-en
No ratings yet
PYTHON100-MOD6-CODING ERRORS-V2-en
7 pages
Ictlearn Hands-On Quarterly Assessment: Web Designing Using HTML
No ratings yet
Ictlearn Hands-On Quarterly Assessment: Web Designing Using HTML
2 pages
ARMLLocal 2014 Solutions
No ratings yet
ARMLLocal 2014 Solutions
13 pages
Wistron Schematic Eiffell238i 2 10TH CPU Ice Lake U
No ratings yet
Wistron Schematic Eiffell238i 2 10TH CPU Ice Lake U
5 pages
Full Stack Dev: Blockchain & AI Focus
No ratings yet
Full Stack Dev: Blockchain & AI Focus
2 pages
Pol111h5s Lec0101
No ratings yet
Pol111h5s Lec0101
17 pages
Module 4 Natural Language Processing
No ratings yet
Module 4 Natural Language Processing
28 pages
Success Website 24-6-19
No ratings yet
Success Website 24-6-19
143 pages
December 2025 Computer Holiday Assignment 2025
No ratings yet
December 2025 Computer Holiday Assignment 2025
7 pages
CST RT 040
No ratings yet
CST RT 040
19 pages
CASINO GAME C++ MICROPROJECT
No ratings yet
CASINO GAME C++ MICROPROJECT
12 pages
SVM Hands-On Problem
No ratings yet
SVM Hands-On Problem
7 pages
Sae J2799-2019
50% (2)
Sae J2799-2019
33 pages
CVWWW Presa
100% (1)
CVWWW Presa
1 page
FlashSystem - Distributed RAID - 2021-Jul-01
No ratings yet
FlashSystem - Distributed RAID - 2021-Jul-01
9 pages
Dunn Complaint
No ratings yet
Dunn Complaint
144 pages
EDPM Multiple Choice Topic by Topic Test
100% (5)
EDPM Multiple Choice Topic by Topic Test
12 pages
C++ Basics Summer Training Report
No ratings yet
C++ Basics Summer Training Report
20 pages
ENGR 3032 2068 Report-3
No ratings yet
ENGR 3032 2068 Report-3
5 pages

Lecture 04 - Supervised Learning by Computing Distances (2) - Plain

Uploaded by

Lecture 04 - Supervised Learning by Computing Distances (2) - Plain

Uploaded by

Learning by Computing Distances (2):

Wrapping-up LwP, Nearest Neighbors

CS771: Introduction to Machine Learning

Prediction rule for LwP

+ For LwP, the prototype vectors (or

So the “score” of a test point is a weighted sum of its

Use a smaller for the horizontal

 LwP can be extended to learn nonlinear decision boundaries if we use

 K-means also computes means/centres/prototypes of groups of unlabeled points

 Unlike LwP which does prototype based comparison, nearest neighbors

Nearest Neighbors for Classification

Indeed. And that’s

Nearest neighbour approach

Just like K, is also a

 An improvement: When 1 voting,

 For multi-label learning, each output is a binary vector (presence/absence of

 The unweighted KNN prediction for a test input can be written as

You might also like