Lecture 12 Distance Metrics Different Distance Metrics in Machine Learning
Lecture 12 Distance Metrics Different Distance Metrics in Machine Learning
So, the Euclidean Distance between these two points A and B will be:
Where,
n = number of dimensions
pi, qi = data points
Example
Python Code for Euclidean Distance : SciPy library that contains pre-written codes for most
of the distance functions used in Python:
from scipy.spatial import distance
# defining the points
point_1 = (1, 2, 3)
point_2 = (4, 5, 6)
point_1, point_2
These are the two sample points which we will be using to calculate the different distance
functions. Let’s now calculate the Euclidean Distance between these two points:
# computing the euclidean distance
euclidean_distance = distance.euclidean(point_1, point_2)
print('Euclidean Distance b/w', point_1, 'and', point_2, 'is: ', euclidean_distance)
Since the above representation is 2 dimensional, to calculate Manhattan Distance, we will take
the sum of absolute distances in both the x and y directions.
So, the Manhattan distance in a 2-dimensional space is given as:
Where,
n = number of dimensions
pi, qi = data points
Instead of taking the straight line like in the Euclidean Distance, we ‘walk’ through
available, pre-defined paths. 2D path can be generalized with Manhattan Distance. Use
the Manhattan distance when the features are entire integers (1,2,3,4…) with no decimal
parts. The Manhattan Distance always returns a positive integer.
Minkowski distance comes under consideration with machine learning algorithm, when
distance measures give control over the type of distance measure.
If there is a confusion that which distance matrix should be use in algorithm, then
Minkowski distance measure is best, it is good for model optimization.
The p parameter of the Minkowski Distance metric of SciPy represents the order of the norm.
When the order(p) is 1, it will represent Manhattan Distance and when the order in the above
formula is 2, it will represent Euclidean Distance.
# minkowski and manhattan distance
minkowski_distance_order_1 = distance.minkowski(point_1, point_2, p=1)
print('Minkowski Distance of order 1:',minkowski_distance_order_1, '\nManhattan
Distance: ',manhattan_distance)
Here, you can see that when the order is 1, both Minkowski and Manhattan Distance are the
same.
# minkowski and euclidean distance
minkowski_distance_order_2 = distance.minkowski(point_1, point_2, p=2)
print('Minkowski Distance of order 2:',minkowski_distance_order_2, '\nEuclidean
Distance: ',euclidean_distance)
When the order is 2, we can see that Minkowski and Euclidean distances are the same.
4. Hamming Distance: Hamming Distance measures the similarity between two strings of
the same length. The Hamming Distance between two strings of the same length is the number
of positions at which the corresponding characters are different.
Let’s understand the concept using an example. Let’s say we have two strings:
“euclidean” and “manhattan”
Since the length of these strings is equal, we can calculate the Hamming Distance.
We will go character by character and match the strings.
The first character of both the strings (e and m respectively) is different.
Similarly, the second character of both the strings (u and a) is different. and so on.
Hence, the Hamming Distance here will be 7.
Another Example
As we saw in the example above, the Hamming Distance between “euclidean” and
“manhattan” is 7.
Note: Hamming Distance only works when we have strings of the same length.
Let’s see what happens when we have strings of different lengths:
# strings of different shapes
new_string_1 = 'data'
new_string_2 = 'science'
len(new_string_1), len(new_string_2)
You can see that the lengths of both the strings are different.
This throws an error saying that the lengths of the arrays must be the same. Hence, Hamming
distance only works when we have strings or arrays of the same length.
Example : Classify the following data types into two classes using Euclidian distance.
X1 = (2, 3, 4), X2 = (1, 2, 3) and X3 (0, -2, -5)
Example: Prove that points A (0, 4), B (6, 2), and C (9, 1) are collinear.
Solution: To prove it, the sum of the distances between two pairs of points must be equal to the distance
between the third pair.
AB + BC = CA
Solution: Three vertices A, B, and C are vertices of an equilateral triangle if and only if AB = BC = CA.
Given:
A(√3, 1) = (x1,y1)(x1,y1)
B(0, 0) = (x2,y2)(x2,y2)
C(2, 0) = (x3,y3)(x3,y3)
= √(3 + 1)
= √4
=2
= √[(2-0)2 + (0-0)2]
= √(4 + 0)
= √4
=2
= √(9 + 25)
= √34
Here AB = BC ≠ CA.