0% found this document useful (0 votes)

2 views4 pages

Maximal Margin Classifier Explained

The document outlines the construction of a Maximal Margin Classifier for binary classification, detailing the setup of a linear classifier defined by a hyperplane. It explains the concepts of distance from a point to a hyperplane, functional and geometric margins, and the optimization problem to maximize the geometric margin under certain constraints. The final section connects this formulation to the standard hard-margin SVM problem, demonstrating how maximizing the margin relates to minimizing the norm of the weight vector.

Uploaded by

Jaat Sachit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views4 pages

Maximal Margin Classifier Explained

Uploaded by

Jaat Sachit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Maximal Margin Classifier – Construction from Scratch

1. Problem Setup

We have a binary classification problem with n training observations:

- Feature vectors: x_i ∈ R^p for i = 1,...,n
- Class labels: y_i ∈ {−1, +1}

We seek a linear classifier defined by a hyperplane:

β_0 + β^T x = 0

where β = (β_1,...,β_p)^T is the normal vector to the hyperplane and β_0 is the intercept.

The classifier predicts:

■(x) = sign(β_0 + β^T x)

A point x_i with label y_i is classified correctly if:

y_i (β_0 + β^T x_i) > 0

2. Distance from a Point to a Hyperplane

Consider the hyperplane:

β_0 + β^T x = 0

We want the perpendicular distance from x_i to this hyperplane.

Parameterize the line through x_i in the direction of the unit normal β / ||β||:
x(t) = x_i − t · (β / ||β||)

The point x(t*) on this line that lies on the hyperplane satisfies:
β_0 + β^T x(t*) = 0

Substitute x(t*):
β_0 + β^T(x_i − t* β / ||β||) = 0
β_0 + β^T x_i − t* (β^T β / ||β||) = 0
β_0 + β^T x_i − t* ||β|| = 0

Solve for t*:

t* = (β_0 + β^T x_i) / ||β||

The perpendicular distance from x_i to the hyperplane is |t*|, hence:

dist(x_i, hyperplane) = |β_0 + β^T x_i| / ||β||

If we want a signed distance consistent with the label y_i, we use:

signed_dist(x_i) = y_i (β_0 + β^T x_i) / ||β||

3. Hyperplane Scaling Invariance

If β_0 + β^T x = 0 defines a hyperplane, then for any k ≠ 0, the equation

k(β_0 + β^T x) = 0
or equivalently
(kβ_0) + (kβ)^T x = 0
defines the same set of points x. Therefore, a hyperplane is invariant under nonzero
scaling of (β_0, β).

Thus, we can impose a constraint on β, such as ||β|| = 1, without changing which geometric
hyperplanes we can represent. This simply chooses a normalized representation of the
hyperplane.

4. Functional Margin and Geometric Margin

For a classifier (β_0, β), the functional margin of (x_i, y_i) is:
γ■_i = y_i (β_0 + β^T x_i)

This is positive if x_i is correctly classified, negative otherwise. The functional margin
of the classifier on the dataset is:
γ■ = min_i γ■_i = min_i y_i (β_0 + β^T x_i)

The geometric margin uses the actual perpendicular distance:

γ_i = y_i (β_0 + β^T x_i) / ||β||
and the geometric margin of the classifier is:
γ = min_i γ_i = min_i y_i (β_0 + β^T x_i) / ||β||

Note the relationship:

γ = γ■ / ||β||

so the geometric margin is invariant to scaling of (β_0, β).

5. Optimization Problem for the Maximal Margin Hyperplane

We assume the data are linearly separable, so there exists some (β_0, β) such that:
y_i (β_0 + β^T x_i) > 0 for all i

The maximal margin hyperplane is the one that maximizes the geometric margin γ. We use the
scaling freedom to impose ||β|| = 1. Under this normalization, the perpendicular distance
from x_i to the hyperplane becomes:
dist(x_i, hyperplane) = y_i (β_0 + β^T x_i)
since ||β|| = 1.

We then define M to be the (common) lower bound on these distances:

y_i (β_0 + β^T x_i) ≥ M for all i

Given ||β|| = 1, this inequality says each observation lies on the correct side of the
hyperplane and at least distance M away from it.

Hence the optimization problem is:

maximize M
subject to ||β||^2 = 1
y_i (β_0 + β^T x_i) ≥ M, for all i = 1,...,n

In component form this is:

maximize M
subject to ∑_{j=1}^p β_j^2 = 1
y_i (β_0 + β_1 x_{i1} + ... + β_p x_{ip}) ≥ M, for all i

6. Interpretation of the Constraints

- Constraint y_i (β_0 + β^T x_i) ≥ M:

If M > 0, then each training point satisfies y_i (β_0 + β^T x_i) ≥ M > 0. Thus, every
point is on the correct side of the hyperplane, and its perpendicular distance to the
hyperplane is at least M (because ||β|| = 1). This provides a buffer or cushion, not just
correctness.

- Constraint ∑ β_j^2 = 1 (or ||β|| = 1):

This does not change which hyperplanes we can express (due to scaling invariance).
Instead, it ensures that y_i (β_0 + β^T x_i) directly equals the perpendicular distance to
the hyperplane, making M a true geometric margin.

Therefore, M is exactly the margin of the hyperplane: the smallest perpendicular distance
of any training observation to the hyperplane.

7. Connection to the Standard Hard-Margin SVM Formulation

The above problem is:

maximize M
subject to ||β|| = 1
y_i (β_0 + β^T x_i) ≥ M, for all i

We can rescale (β_0, β) and M to obtain the common SVM form. Define:
w = β / M, b = β_0 / M

Then:
w^T x_i + b = (β^T x_i + β_0) / M

From y_i (β_0 + β^T x_i) ≥ M we get:

y_i (w^T x_i + b) = y_i (β_0 + β^T x_i) / M ≥ 1

Also, since ||β|| = 1:

||w|| = ||β / M|| = 1 / M => M = 1 / ||w||

Maximizing M is thus equivalent to minimizing ||w||, or equivalently minimizing (1/2)

||w||^2. This yields the usual hard-margin SVM primal problem:

minimize (1/2) ||w||^2

subject to y_i (w^T x_i + b) ≥ 1, for all i

Thus, the optimization problem with variables (β_0, β, M) and constraints (∑ β_j^2 = 1,
y_i (β_0 + β^T x_i) ≥ M) is just another way of expressing the construction of the maximal
margin classifier: it explicitly maximizes the geometric margin M.

ML-chap13 2024 110331
No ratings yet
ML-chap13 2024 110331
67 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
Chapter 6
No ratings yet
Chapter 6
35 pages
SVM 1
No ratings yet
SVM 1
6 pages
Week-10 - Summary
No ratings yet
Week-10 - Summary
13 pages
Understanding Hyperplanes in SVM
No ratings yet
Understanding Hyperplanes in SVM
26 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
Lecture 03
No ratings yet
Lecture 03
4 pages
27 Support - Vector - Machine
No ratings yet
27 Support - Vector - Machine
17 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
40 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
21 Support Vector Machines 03-10-2024
No ratings yet
21 Support Vector Machines 03-10-2024
72 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
SVM Formulation and Optimization
No ratings yet
SVM Formulation and Optimization
16 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
29 pages
Main 7
No ratings yet
Main 7
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
Report 1
No ratings yet
Report 1
6 pages
Support Vector Machines: 1 What's SVM
No ratings yet
Support Vector Machines: 1 What's SVM
25 pages
cs229 SVM Notes
No ratings yet
cs229 SVM Notes
20 pages
Week 9
No ratings yet
Week 9
14 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
No ratings yet
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
70 pages
Week-11 - Summary
No ratings yet
Week-11 - Summary
20 pages
Week 6 SVM
No ratings yet
Week 6 SVM
18 pages
Support Vector Machine Guide
No ratings yet
Support Vector Machine Guide
41 pages
Support Vector Machines Explained
No ratings yet
Support Vector Machines Explained
8 pages
Session19 Extra SVM
No ratings yet
Session19 Extra SVM
59 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
17 pages
Perceptron
No ratings yet
Perceptron
23 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Support Vector Machines
No ratings yet
Support Vector Machines
30 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
SVM PCA Kmeans
No ratings yet
SVM PCA Kmeans
121 pages
Unit 3 - SVM, Bayesian Networks and ANN - 2 - 1724652402213
No ratings yet
Unit 3 - SVM, Bayesian Networks and ANN - 2 - 1724652402213
65 pages
Introduction of Support Vector Machines
No ratings yet
Introduction of Support Vector Machines
16 pages
Main
No ratings yet
Main
12 pages
Tutorial4 SVM
No ratings yet
Tutorial4 SVM
8 pages
7 SVM For Scientists Annotated
No ratings yet
7 SVM For Scientists Annotated
76 pages
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
No ratings yet
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
8 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
Support Vector Machines
No ratings yet
Support Vector Machines
18 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
9 pages
w04 LectureSlices MA4550
No ratings yet
w04 LectureSlices MA4550
32 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
10 pages
SVM
No ratings yet
SVM
11 pages
Session On Maths Behind SVM Kernels
No ratings yet
Session On Maths Behind SVM Kernels
14 pages
Stack and Queues 02 - 251128 - 184350
No ratings yet
Stack and Queues 02 - 251128 - 184350
23 pages
Associate Software Engineer
No ratings yet
Associate Software Engineer
2 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Advanced Blood Management System Community Method Based On Deep Learning
No ratings yet
Advanced Blood Management System Community Method Based On Deep Learning
5 pages
CAPE PureMath-Units 1 and 2
100% (1)
CAPE PureMath-Units 1 and 2
68 pages
UFABC - Mecânica Clássica I - Resolução de Lista de Exercícios 1
No ratings yet
UFABC - Mecânica Clássica I - Resolução de Lista de Exercícios 1
11 pages
British Aircraft Numerical Geometry System
No ratings yet
British Aircraft Numerical Geometry System
10 pages
Java 3D for Beginner Programmers
No ratings yet
Java 3D for Beginner Programmers
40 pages
1.1 Vectors in The Plane (2-D)
No ratings yet
1.1 Vectors in The Plane (2-D)
24 pages
Shape From Shading A Survey
No ratings yet
Shape From Shading A Survey
41 pages
Advanced Modern Geo 2025
No ratings yet
Advanced Modern Geo 2025
117 pages
Vectors Mixed Practice 12ıb H
No ratings yet
Vectors Mixed Practice 12ıb H
8 pages
Level Curves of Functions of Two Variables
No ratings yet
Level Curves of Functions of Two Variables
34 pages
07 Triangles
No ratings yet
07 Triangles
14 pages
Understanding Volume of Solids
No ratings yet
Understanding Volume of Solids
14 pages
B.E. Mechanical Engineering Syllabus
No ratings yet
B.E. Mechanical Engineering Syllabus
87 pages
Vectors II Tutorial 2 Solutions
100% (1)
Vectors II Tutorial 2 Solutions
18 pages
Composite Tensile Test and Optimisation by Marius Mueller
No ratings yet
Composite Tensile Test and Optimisation by Marius Mueller
98 pages
Class XI Maths Pract 7-10
No ratings yet
Class XI Maths Pract 7-10
8 pages
Calculus III Midterm Exam
No ratings yet
Calculus III Midterm Exam
27 pages
Addition and Multiplication of Vectors (PS - 2) - A
No ratings yet
Addition and Multiplication of Vectors (PS - 2) - A
10 pages
Surface Stress Theory in Solids
No ratings yet
Surface Stress Theory in Solids
10 pages
Practice 14.6
No ratings yet
Practice 14.6
14 pages
Equation of A Plane
No ratings yet
Equation of A Plane
34 pages
NASTRAN-PATRAN Example - Problems
No ratings yet
NASTRAN-PATRAN Example - Problems
57 pages
Documentation CloudCompare Version 2 1 Eng
No ratings yet
Documentation CloudCompare Version 2 1 Eng
68 pages
Department of Computer Science and Engineering (EEE) : United International University Course Syllabus
No ratings yet
Department of Computer Science and Engineering (EEE) : United International University Course Syllabus
4 pages
Calculus Vectors Lecture Notes
No ratings yet
Calculus Vectors Lecture Notes
18 pages
LESSON PLAN Properties of Waves
No ratings yet
LESSON PLAN Properties of Waves
19 pages
Poser Tutorial Manual
No ratings yet
Poser Tutorial Manual
364 pages
Complete Bundle Calculus 4th Edition Smith
No ratings yet
Complete Bundle Calculus 4th Edition Smith
405 pages
Lesson 6 Role of Mathematics in Gaming
No ratings yet
Lesson 6 Role of Mathematics in Gaming
29 pages
Wolfram Mathematica Code
No ratings yet
Wolfram Mathematica Code
4 pages

Maximal Margin Classifier Explained

Uploaded by

Maximal Margin Classifier Explained

Uploaded by

Maximal Margin Classifier – Construction from Scratch

We have a binary classification problem with n training observations:

We seek a linear classifier defined by a hyperplane:

The classifier predicts:

A point x_i with label y_i is classified correctly if:

2. Distance from a Point to a Hyperplane

Consider the hyperplane:

We want the perpendicular distance from x_i to this hyperplane.

Solve for t*:

The perpendicular distance from x_i to the hyperplane is |t*|, hence:

If we want a signed distance consistent with the label y_i, we use:

3. Hyperplane Scaling Invariance

If β_0 + β^T x = 0 defines a hyperplane, then for any k ≠ 0, the equation

4. Functional Margin and Geometric Margin

The geometric margin uses the actual perpendicular distance:

Note the relationship:

so the geometric margin is invariant to scaling of (β_0, β).

5. Optimization Problem for the Maximal Margin Hyperplane

We then define M to be the (common) lower bound on these distances:

Hence the optimization problem is:

In component form this is:

6. Interpretation of the Constraints

- Constraint y_i (β_0 + β^T x_i) ≥ M:

- Constraint ∑ β_j^2 = 1 (or ||β|| = 1):

7. Connection to the Standard Hard-Margin SVM Formulation

The above problem is:

From y_i (β_0 + β^T x_i) ≥ M we get:

Also, since ||β|| = 1:

Maximizing M is thus equivalent to minimizing ||w||, or equivalently minimizing (1/2)

minimize (1/2) ||w||^2

You might also like