0% found this document useful (0 votes)

85 views26 pages

Statistical Learning by Sasha Rakhlin

The document provides an outline and summary of a lecture on statistical learning and the perceptron algorithm. The key points covered are: 1) The document introduces the concepts of generalization, supervised learning, and the goals of prediction and estimation. 2) It then describes the perceptron learning algorithm, which maintains a hypothesis and updates it based on misclassified examples from the training data. 3) The lecture discusses the concept of consistency, where a learning algorithm is able to approach the performance of the optimal Bayes predictor as the sample size increases.

Uploaded by

Irfan Fadhullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views26 pages

Statistical Learning by Sasha Rakhlin

Uploaded by

Irfan Fadhullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 12

Statistical Learning: First Steps

Sasha Rakhlin

Oct 17, 2019

1 / 24
Outline

Setup

Perceptron

2 / 24
What is Generalization?

3 / 24
What is Generalization?

log(1 + 2 + 3) = log(1) + log(2) + log(3)

3 / 24
What is Generalization?

log(1 + 2 + 3) = log(1) + log(2) + log(3)

log(1 + 1.5 + 5) = log(1) + log(1.5) + log(5)

log(2 + 2) = log(2) + log(2)

log(1 + 1.25 + 9) = log(1) + log(1.25) + log(9)

3 / 24
Outline

Setup

Perceptron

4 / 24
Supervised Learning: data S = {(X1 , Y1 ), . . . , (Xn , Yn )} are i.i.d. from
unknown distribution P.

Learning algorithm: a mapping {(X1 , Y1 ), . . . , (Xn , Yn )} z→ f̂n .

Goals:
▸ Prediction: small expected loss

L(f̂n ) = EX,Y `(Y, f̂n (X)).

Here (X, Y) ∼ P. Interpretation: good prediction on a random example

from same population.
▸ Estimation: small ∥f̂n − f∗ ∥, or ∥̂
θ − θ∗ ∥, where f∗ or θ∗ are
parameters of P (e.g. regression function f∗ (x) = E[Y∣X = x], or
f∗ (x) = ⟨θ∗ , x⟩, etc).

In this course, we mostly focus on prediction, but will also outline

connections between prediction and estimation.

5 / 24
Why not estimate the underlying distribution P first?
This is in general a harder problem than prediction. Consider classification.
We might be attempting to learn parts/properties of the distribution that
are irrelevant, while all we care about is the “boundary” between the two
classes.

6 / 24
Key difficulty: our goals are in terms of unknown quantities related to
unknown P. Have to use empirical data instead. Purview of statistics.
For instance, we can calculate the empirical loss of f ∶ X → Y

̂ 1 n
L(f) = ∑ `(Yi , f(Xi ))
n i=1

7 / 24
The function x ↦ f̂n (x) = f̂n (x; X1 , Y1 , . . . , Xn , Yn ) is random, since it
depends on the random data S = (X1 , Y1 , . . . , Xn , Yn ). Thus, the risk

L(f̂n ) = E [`(f̂n (X), Y)∣S ]

= E [`(f̂n (X; X1 , Y1 , . . . , Xn , Yn ), Y)∣S ]

is a random variable. We might aim for EL(f̂n ) small, or L(f̂n ) small with
high probability (over the training data).

8 / 24
Quiz: what is random here?

1. ̂
L(f) for a given fixed f
2. f̂n
L(f̂n )
3. ̂
4. L(f̂n )
5. L(f) for a given fixed f

It is important that these are understood before we proceed further.

9 / 24
Theoretical analysis of performance is typically easier if f̂n has closed form
(in terms of the training data).

E.g. ordinary least squares f̂n (x) = x T (X T X)−1 X T Y.

Unfortunately, most ML and many Statistical procedures are not explicitly

defined but arise as
▸ solutions to an optimization objective (e.g. logistic regression)
▸ as an iterative procedure without an immediately obvious objective
function (e.g. AdaBoost, Random Forests, etc)

10 / 24
The Gold Standard

Within the framework we set up, the smallest expected loss is achieved by
the Bayes optimal function

f∗ = arg min L(f)

where the minimization is over all (measurable) prediction rules f ∶ X → Y.

The value of the lowest expected loss is called the Bayes error:

L(f∗ ) = inf L(f)

Of course, we cannot calculate any of these quantities since P is unknown.

11 / 24
Bayes Optimal Function

Bayes optimal function f∗ takes on the following forms in these two

particular cases:
▸ Binary classification (Y = {0, 1}) with the indicator loss:

f∗ (x) = I{η(x) ≥ 1/2}, where η(x) = E[Y∣X = x]

⌘(x)
0

▸ Regression (Y = R) with squared loss:

f∗ (x) = η(x), where η(x) = E[Y∣X = x]

12 / 24
The big question: is there a way to construct a learning algorithm with a
guarantee that
L(f̂n ) − L(f∗ )
is small for large enough sample size n?

13 / 24
Consistency

An algorithm that ensures

lim L(f̂n ) = L(f∗ ) almost surely

n→∞

is called consistent. Consistency ensures that our algorithm is approaching

the best possible prediction performance as the sample size increases.

The good news: consistency is possible to achieve.

▸ easy if X is a finite or countable set
▸ not too hard if X is infinite, and the underlying relationship between x
and y is “continuous”

14 / 24
The bad news...
In general, we cannot prove anything quantitative about L(f̂n ) − L(f∗ ),
unless we make further assumptions (incorporate prior knowledge).

“No Free Lunch” Theorems: unless we posit assumptions,

▸ For any algorithm f̂n , any n and any > 0, there exists a distribution
P such that L(f∗ ) = 0 and
1
EL(f̂n ) ≥ −
2

▸ For any algorithm f̂n , and any sequence an that converges to 0, there
exists a probability distribution P such that L(f∗ ) = 0 and for all n

EL(f̂n ) ≥ an

Reference: (Devroye, Györfi, Lugosi: A Probabilistic Theory of Pattern Recognition),

(Bousquet, Boucheron, Lugosi, 2004)

15 / 24
is this really “bad news”?

Not really. We always have some domain knowledge.

Two ways of incorporating prior knowledge:

▸ Direct way: assumptions on distribution P (e.g. margin, smoothness of
regression function, etc)
▸ Indirect way: redefine the goal to perform as well as a reference set F
of predictors:
L(f̂n ) − inf L(f)
f∈F

F encapsulates our inductive bias.

We often make both of these assumptions.

16 / 24
Outline

Setup

Perceptron

17 / 24
We start our study of Statistical Learning with the classical Perceptron
algorithm.

Reason: simplicity. We will give a three-line proof of Perceptron, followed

by two interesting consequences with one-line proofs each. These
consequences are, perhaps, the easiest nontrivial statistical guarantees I can
think of.

18 / 24
Perceptron

<latexit sha1_base64="+03BdUB6TWqLjuCfwSU3OIMjF3A=">AAAB7HicbVDLSgNBEOz1GeMr6tHLYBA8hV0R1FvQi8cIbhJIljA7mU3GzGOZmRXCkn/w4kHFqx/kzb9xkuxBEwsaiqpuurvilDNjff/bW1ldW9/YLG2Vt3d29/YrB4dNozJNaEgUV7odY0M5kzS0zHLaTjXFIua0FY9up37riWrDlHyw45RGAg8kSxjB1knN7gALgXuVql/zZ0DLJChIFQo0epWvbl+RTFBpCcfGdAI/tVGOtWWE00m5mxmaYjLCA9pxVGJBTZTPrp2gU6f0UaK0K2nRTP09kWNhzFjErlNgOzSL3lT8z+tkNrmKcibTzFJJ5ouSjCOr0PR11GeaEsvHjmCimbsVkSHWmFgXUNmFECy+vEzC89p1Lbi/qNZvijRKcAwncAYBXEId7qABIRB4hGd4hTdPeS/eu/cxb13xipkj+APv8wfzVo7p</latexit>

19 / 24
Perceptron

(x1 , y1 ), . . . , (xT , yT ) ∈ X × {±1} (T may or may not be same as n)

Maintain a hypothesis wt ∈ Rd (initialize w1 = 0).

On round t,
▸ Consider (xt , yt )
▸ Form prediction ̂
yt = sign(⟨wt , xt ⟩)
▸ If ̂
yt ≠ yt , update
wt+1 = wt + yt xt

else
wt+1 = wt

20 / 24
Perceptron

wt
<latexit sha1_base64="7J0RWCUqTAOGp/2VV5Bhw8gCVEA=">AAAB6nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKMR6DXjxGNA9IljA7mU2GzM4uM71KCPkELx4U8eoXefNvnLxAjQUNRVU33V1BIoVB1/1yMiura+sb2c3c1vbO7l5+/6Bu4lQzXmOxjHUzoIZLoXgNBUreTDSnUSB5IxhcT/zGA9dGxOoehwn3I9pTIhSMopXuHjvYyRfcojsFcYveeal0USbeQlmQAsxR7eQ/292YpRFXyCQ1puW5CfojqlEwyce5dmp4QtmA9njLUkUjbvzR9NQxObFKl4SxtqWQTNWfEyMaGTOMAtsZUeybv95E/M9rpRhe+iOhkhS5YrNFYSoJxmTyN+kKzRnKoSWUaWFvJaxPNWVo08nZEJZeXib1s6JnI7o9L1Su5nFk4QiO4RQ8KEMFbqAKNWDQgyd4gVdHOs/Om/M+a80485lD+AXn4xuZmI3/</latexit>

xt
<latexit sha1_base64="wK72DhZCMdU9mhIbOxvRhU8Wi14=">AAAB6nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKMR6DXjxGNA9IljA7mU2GzM4uM71iCPkELx4U8eoXefNvnLxAjQUNRVU33V1BIoVB1/1yMiura+sb2c3c1vbO7l5+/6Bu4lQzXmOxjHUzoIZLoXgNBUreTDSnUSB5IxhcT/zGA9dGxOoehwn3I9pTIhSMopXuHjvYyRfcojsFcYveeal0USbeQlmQAsxR7eQ/292YpRFXyCQ1puW5CfojqlEwyce5dmp4QtmA9njLUkUjbvzR9NQxObFKl4SxtqWQTNWfEyMaGTOMAtsZUeybv95E/M9rpRhe+iOhkhS5YrNFYSoJxmTyN+kKzRnKoSWUaWFvJaxPNWVo08nZEJZeXib1s6JnI7o9L1Su5nFk4QiO4RQ8KEMFbqAKNWDQgyd4gVdHOs/Om/M+a80485lD+AXn4xubHo4A</latexit>

21 / 24
For simplicity, suppose all data are in a unit ball, ∥xt ∥ ≤ 1.

Margin with respect to (x1 , y1 ), . . . , (xT , yT ):

γ = max min (yi ⟨w, xi ⟩)+ ,

∥w∥=1 i∈[T ]

where (a)+ = max{0, a}.

Theorem (Novikoff ’62).

Perceptron makes at most 1/γ2 mistakes (and corrections) on any

sequence of examples with margin γ.

22 / 24
Proof: Let m be the number of mistakes after T iterations. If a mistake
is made on round t,

∥wt+1 ∥2 = ∥wt + yt xt ∥2 ≤ ∥wt ∥2 + 2yt ⟨wt , xt ⟩ + 1 ≤ ∥wt ∥2 + 1.

Hence,
∥wT ∥2 ≤ m.
For optimal hyperplane w∗

γ ≤ ⟨w∗ , yt xt ⟩ = ⟨w∗ , wt+1 − wt ⟩ .

Hence (adding and canceling),

√
mγ ≤ ⟨w∗ , wT ⟩ ≤ ∥wT ∥ ≤ m.

23 / 24
Recap

For any T and (x1 , y1 ), . . . , (xT , yT ),

T
D2
∑ I{yt ⟨wt , xt ⟩ ≤ 0} ≤
t=1 γ2

where γ = γ(x1∶T , y1∶T ) is margin and D = D(x1∶T , y1∶T ) = maxt ∥xt ∥.

Let w∗ denote the max margin hyperplane, ∥w∗ ∥ = 1.

24 / 24

Statistical Learning Theory Overview
No ratings yet
Statistical Learning Theory Overview
213 pages
ML 01
No ratings yet
ML 01
24 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
SLT 2024
No ratings yet
SLT 2024
94 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Merge
No ratings yet
Merge
240 pages
ML Merge
No ratings yet
ML Merge
145 pages
Statistical Learning Theory Guide
No ratings yet
Statistical Learning Theory Guide
4 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
PAC Bayesian Learning Introduction
No ratings yet
PAC Bayesian Learning Introduction
124 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Advanced Machine Learning Concepts
No ratings yet
Advanced Machine Learning Concepts
5 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
Lecture 02
No ratings yet
Lecture 02
4 pages
Generalization in CS168 Algorithms
No ratings yet
Generalization in CS168 Algorithms
16 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
100 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
ML 3
No ratings yet
ML 3
66 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
Logistic Regression and Sigmoid Function
No ratings yet
Logistic Regression and Sigmoid Function
32 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
ML Opt
No ratings yet
ML Opt
89 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
Machine Learning PPT Part III
No ratings yet
Machine Learning PPT Part III
26 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
ECE 6254: Statistical Machine Learning
No ratings yet
ECE 6254: Statistical Machine Learning
22 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
Classification
No ratings yet
Classification
47 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
No ratings yet
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
195 pages
Notes Stat Learning
No ratings yet
Notes Stat Learning
64 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
All Models Are Wrong
No ratings yet
All Models Are Wrong
429 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
Chapter 3 - Introduction Via Linear Regression
No ratings yet
Chapter 3 - Introduction Via Linear Regression
20 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Supervised Learning Steps in ML
No ratings yet
Supervised Learning Steps in ML
10 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Sanet - ST - Evol of Air Interf Tow 5G
No ratings yet
Sanet - ST - Evol of Air Interf Tow 5G
298 pages
4 - Attack Phase - 2018
No ratings yet
4 - Attack Phase - 2018
52 pages
ERM and Uniform Convergence Insights
No ratings yet
ERM and Uniform Convergence Insights
29 pages
Bocoran Soal UN Matematika TKP SMK 2015 PDF
No ratings yet
Bocoran Soal UN Matematika TKP SMK 2015 PDF
11 pages
Cbse Class 9 Maths Notes Chapter 10 Circles
No ratings yet
Cbse Class 9 Maths Notes Chapter 10 Circles
9 pages
Impact of Education on Cat Pain Awareness
No ratings yet
Impact of Education on Cat Pain Awareness
10 pages
Speech Writing Notes
No ratings yet
Speech Writing Notes
10 pages
Pelton Wheel Efficiency Analysis
No ratings yet
Pelton Wheel Efficiency Analysis
6 pages
Naveena's Story
No ratings yet
Naveena's Story
1 page
Anauroch, The Empire of Shade
100% (27)
Anauroch, The Empire of Shade
162 pages
ATM Cash Prediction Using Time Series Approach
No ratings yet
ATM Cash Prediction Using Time Series Approach
6 pages
Usg Paraline Baffles Data en IC700 - Removed
No ratings yet
Usg Paraline Baffles Data en IC700 - Removed
1 page
Id 27
No ratings yet
Id 27
9 pages
Student Portfolio Overview
No ratings yet
Student Portfolio Overview
41 pages
Lts Project
No ratings yet
Lts Project
5 pages
Xerox 6204 Wide Format Training Guide
No ratings yet
Xerox 6204 Wide Format Training Guide
270 pages
Art and Design Studio Section 3 - LV
No ratings yet
Art and Design Studio Section 3 - LV
15 pages
Why You Need To Be A Systems Thinker in Health Care
No ratings yet
Why You Need To Be A Systems Thinker in Health Care
3 pages
Kindergarten Daily Class Program Guide
No ratings yet
Kindergarten Daily Class Program Guide
5 pages
Altivar Machine ATV320 - ATV320U55N4B PDF
No ratings yet
Altivar Machine ATV320 - ATV320U55N4B PDF
10 pages
Dr. Sadao's Dilemma: Duty vs. Patriotism
No ratings yet
Dr. Sadao's Dilemma: Duty vs. Patriotism
4 pages
Chapter 3 - Consumer Behaviour 2021
No ratings yet
Chapter 3 - Consumer Behaviour 2021
25 pages
VLSI Paperrrrrrrrrr
No ratings yet
VLSI Paperrrrrrrrrr
33 pages
Hydrocyclone Applications
No ratings yet
Hydrocyclone Applications
10 pages
Lec 1:history-Taking
No ratings yet
Lec 1:history-Taking
27 pages
Narrative Report
No ratings yet
Narrative Report
21 pages
Java OOP Project for Students
No ratings yet
Java OOP Project for Students
32 pages
Hospital Kardex for Patient Care
No ratings yet
Hospital Kardex for Patient Care
1 page
Aggarwal Et Al. - 2022
No ratings yet
Aggarwal Et Al. - 2022
23 pages
DC Motor Calculations and Formulas
No ratings yet
DC Motor Calculations and Formulas
8 pages
The Fig Tree
No ratings yet
The Fig Tree
2 pages
Grade 3 Numeracy Report - Nueva Ecija
No ratings yet
Grade 3 Numeracy Report - Nueva Ecija
8 pages
Task Sheet
No ratings yet
Task Sheet
6 pages
Brand Strategies for Market Success
No ratings yet
Brand Strategies for Market Success
3 pages

Statistical Learning by Sasha Rakhlin

Uploaded by

Statistical Learning by Sasha Rakhlin

Uploaded by

Lecture 12

Statistical Learning: First Steps

Oct 17, 2019

log(1 + 2 + 3) = log(1) + log(2) + log(3)

log(1 + 2 + 3) = log(1) + log(2) + log(3)

log(1 + 1.5 + 5) = log(1) + log(1.5) + log(5)

log(2 + 2) = log(2) + log(2)

log(1 + 1.25 + 9) = log(1) + log(1.25) + log(9)

Learning algorithm: a mapping {(X1 , Y1 ), . . . , (Xn , Yn )} z→ f̂n .

L(f̂n ) = EX,Y `(Y, f̂n (X)).

Here (X, Y) ∼ P. Interpretation: good prediction on a random example

In this course, we mostly focus on prediction, but will also outline

L(f̂n ) = E [`(f̂n (X), Y)∣S ]

It is important that these are understood before we proceed further.

E.g. ordinary least squares f̂n (x) = x T (X T X)−1 X T Y.

Unfortunately, most ML and many Statistical procedures are not explicitly

f∗ = arg min L(f)

where the minimization is over all (measurable) prediction rules f ∶ X → Y.

L(f∗ ) = inf L(f)

Of course, we cannot calculate any of these quantities since P is unknown.

Bayes optimal function f∗ takes on the following forms in these two

f∗ (x) = I{η(x) ≥ 1/2}, where η(x) = E[Y∣X = x]

▸ Regression (Y = R) with squared loss:

f∗ (x) = η(x), where η(x) = E[Y∣X = x]

An algorithm that ensures

lim L(f̂n ) = L(f∗ ) almost surely

is called consistent. Consistency ensures that our algorithm is approaching

The good news: consistency is possible to achieve.

“No Free Lunch” Theorems: unless we posit assumptions,

Reference: (Devroye, Györfi, Lugosi: A Probabilistic Theory of Pattern Recognition),

Not really. We always have some domain knowledge.

Two ways of incorporating prior knowledge:

F encapsulates our inductive bias.

We often make both of these assumptions.

Reason: simplicity. We will give a three-line proof of Perceptron, followed

(x1 , y1 ), . . . , (xT , yT ) ∈ X × {±1} (T may or may not be same as n)

Maintain a hypothesis wt ∈ Rd (initialize w1 = 0).

Margin with respect to (x1 , y1 ), . . . , (xT , yT ):

γ = max min (yi ⟨w, xi ⟩)+ ,

where (a)+ = max{0, a}.

Theorem (Novikoff ’62).

Perceptron makes at most 1/γ2 mistakes (and corrections) on any

∥wt+1 ∥2 = ∥wt + yt xt ∥2 ≤ ∥wt ∥2 + 2yt ⟨wt , xt ⟩ + 1 ≤ ∥wt ∥2 + 1.

γ ≤ ⟨w∗ , yt xt ⟩ = ⟨w∗ , wt+1 − wt ⟩ .

Hence (adding and canceling),

For any T and (x1 , y1 ), . . . , (xT , yT ),

where γ = γ(x1∶T , y1∶T ) is margin and D = D(x1∶T , y1∶T ) = maxt ∥xt ∥.

Let w∗ denote the max margin hyperplane, ∥w∗ ∥ = 1.

You might also like